


Institutional Archive of the Naval Postgraduate School 


Calhoun: The NPS Institutional Archive 
DSpace Repository 


Theses and Dissertations 1. Thesis and Dissertation Collection, all items 


1986 


A serial bus architecture for parallel 
processing systems. 


Delaney, Kevin J. 


This publication is a work of the U.S. Government as defined in Title 17, United 
States Code, Section 101. Copyright protection is not available for this work in the 
United States. 


Downloaded from NPS Archive: Calhoun 


Calhoun is the Naval Postgraduate School's public access digital repository for 


f (8 D U DLEY research materials and institutional publications created by the NPS community. 
«ist : Calhoun is named for Professor of Mathematics Guy K. Calhoun, NPS's first 


NY KNOX appointed — and published -- scholarly author. 

| | LIBRARY Dudley Knox Library / Naval Postgraduate School 

411 Dyer Road / 1 University Circle 
Monterey, California USA 93943 





http://www.nps.edu/library 


~y-eea AE 


x 
: 





A Tie Faye te ie oe VE ee th Peer aR ye preva. per ripeeepent ati Gti atch A007 ei a Tyne Sree yret 
Ba ga fF eve Soe Ribbed afer hse SAR AB8 bees ere Ley ae ie a eyes a ape Peeper rer Tot 
Lays a, Tie + Cree Ore ee em PA PLS Ve aetar del Ree wayy8 PN Poe: ar a ete Cea 
OER 

PR 
















a ae ess a 
772A aes BA) va 



























Tt aera Prey) nage ae a is Re ER TER) EIR, Sa Pyle: we reer 
ee Cd eee nL 



























x MOTTO RTE? Y rataerornd dren ett Le new 
fo f Ce rv ee ee oe Ded oF é fae s Ae! ry eer nny Corer 
f aT wn ye Ae SH at ot BSA LD peeis ee Tee ‘ oe ea wah ge figure accae lo age 
ber ek eed we ea Bere a A bt aris rae) a i erie Z 9 Are pl es 
75 8! Sut a t 7 te ae: . tas} roa 
red i Pere: ee a ee) Re 






a ot Gath 


neh rp ah sy shi aeery 
+9 ee PTET ear Bie . a Pa 1° he Why? Te 
nes a 2 7 eee ae 


. Ld 
ts corte er: ee Lae cx ry et ppedtne 

















































































































TL St err eee oR ry he cathe att ct mr 
Tea ytd te eee 3. ee ert: a ArT Da edt: a Rant Tee een vr Pyfta' 
Psoe oh Dery: Pee rors a 5 ae LR Y RLY See ye) a 
Tra eres er ope ree er bee Ria Yt as eye Vinee’) nau fial ee 
Batts Pre ara errr Agdetteb in [4 me (TERE Setcti eee can a y a ry? “ope 8 
t RP tr Loe rr monte tes Py rrr e rene aw ban Meaeyys Ps A FN Ls % airs eee terra 
Bea as 5% re aie ae aie ae A ae ey hein erage Drea Ciara Pad Pa to we en 
ae 4! Pah pe ee) a ear Er} eee ae Mirlorat reer: A. apn ie ae Pore he er ra of a0 ey ed Cw ee areal a ih 
ea DAN. hs ad ta! cs amr er ee earn Tue Pies jal on Mie Ae Pd ey Rr rm rR riers ee ee et Fe oot i a Adee 
4) oe v9 ye oe Ed Oe tr yeas ee Hostal Perey vie Weary mee PUT Ye rier ev a bey Pay ba YEN te ate 
Nea as eee oe ed ve or V efi Bs any eyo PY A ee aa aoa ares Nees aR Re ae mearen ew Fahy re sae Caine dae Bre re 
ae POTEET dont 9 Le ri Pre Gta oy tre Pree OG pare aay a Bie tte nO A 
Re a TL cer ron mente ee ra TE vs a r) ea ey CPE ae oe ee Le 
Rar ry, rrr ke eta eA) ae ne 2p, Be na y eh beat War Crepe ro what ew rey 
Pde Qe hth. 135 BT St ee ere 7 ree ee re Dee PAYin Te cea ae RS py POrAadccgs 
ai ae Pe SCA Se Ea Cryer tatry Brute af. a ba i ae’ se Seay rn ¥i CE oar it a Sih siete st at Ee 
Pi Br he eer 4% ee tad | 2A “aia les yt ee pibctuniadraatA pe vEte estat OE Y2 cai Oat) 
Pes Oe athe d Pree ria eas ae ee ee Se rotate Mao reo + AN ah krtay i Ho ry 
wie Sie CIE ert Hees Dee IN A ee mated aT pets a RAT sahese rey 
ev vo Bey ay a 4 rere enn eB, iat at aia en Maa re A ee Ree Lt, 
Ces Seer Ws 7145 fi ed a . yb . Aon Re : ve be wy hv, adc 4 NShetr Aree 
Bt tar Beet ale An Fy Pelee 4 (oe pe be tr be rays Poe a ea le | 
ar ryote ce roe : 


Ol ty apy Sete dee ona Lo 


Rt POP Ar ie oe ne +a PEAY) Ce i Oe) pe es Bi ru v Ce 









































yh sed pee 
Tt. eed 
n WP utter at alae’ ou] rei Peete ea Phe o Bey Foti | sacs hes rt x ft 
es LASia Sachin ACPA ASAI TATA § Rk Amie ET eee rich Macias Lon te 
: ee 2G blip alias he cred Cree panies CADE nee R 
oe fi a sett nen Po hehe vrs . Ala Ala) 
a Pitas tl ii fintin jm intl ie Een oe as ber ae saat eRe ent al + 
ETA a ee art Ate eG ee were ee tah ee Cre eed roy ee Seen Ch ee o 7a 
re ire aS op ME sep erh Ree To Perret Rae) ke 2 De arr 
eee a] oir 4 ae Val UTATAlala ita yt Aaatyg fry: Strate Pee Wedd ie Pn Liha Terre 
0 ne ad ets fas Pater wor Use Si era a ict Bip or ei tho hl ace gg had 
VA UAeinteh Bane ft ALM Caer bat oe ee ese Be Pa teste Raters tera ee ee Pen 
















uo 

Dee eae ta Krrrr Da Bria we aad ata) Base ats Cex o Cen roe 
Tecan oi ik, Poe ate aie me ere yg tae Ro Ne ay tas Oe pa cee pea pee teie 
; rere ze aca fer tee! ue kar Par LP yr ( Net etd | rere 8 Dee ie? a 2 eee ‘ata 
a eta Leta pont pO erate jal eh whe) Wi beds Gr arnt pp armch ery 
Pars be) Paty aon ae hae Re a ee tea aU ea SAL hcg 
ie Ea ye ce wees ead hes oof rae oy See ou ata lero G) ta | ne Pe ea ee oe wiht 
CHASER Ce athe ie Bethe ev Pree Cia mee oat eo os ae eet ree aah we eee ie SL ey 
Yow ye are eae re ted) Wane tee wal Yo St tary beara: ses ft NS A en 

S roar ot vee yee ‘ Ged ier Wiekinlvws Uy ad An mihi und x 








ae 
TPR ri 
oe rf 4? 

CIO ha 
i eee 
















otis 


































































} fs Bah; 

cA LA Pag ie eat PP u rer le wee ts Td era 
Seek fee i 7 i o iu ET Oe Pe EA De ae + ile a ee pres rar ce Lene Ee 5 
ee a AY ‘. tes Ae ae re ere een er pee are 7) EV Pee oy Aun? en ie oa ee Ot) ate ele Pret errr ee 
Me) ay Sous SE tha oe y re Ry bayare CURT SES Rae mit nth ey vm Fe iPr ae we 
ay ren prior Fri Nome k ee palo r esa sr upstate ern Ayne ey ee wee EP RY 











tales 

ates CT eon a HA 

Si Amd carpe ee ever aerate 

ae re 
as 


“2 TM Se gains Rohan may EAL ALD EY we of DES D mR ream ate Pe ce 
ot ee et aU ad is Dat A Le ead t¢ een cn ry eter Dus hel +7 ia ra 
carte oar Cea a ORO T Ie 
















































ye nae or iets aE seria ck wie ye aero 
ate or rer erro aes a2 ray eis wf pane LXe ELEY 
. BAe bahar eB LAE SLES Pa ape ipames Ma darhisy © patsy eee 
Pe va Rr eee ee Ora Ry Teas Re Mr i ere i Pd een ee eatnty er 
ern VErL' ‘cr At a ee Ue Seen ae uve are DT ROA TS Peete. me ri ath af, 
Lees oon Oy hart ee Py Uva ot 7 ie eo ns Rouen Aut ma fins a 













































Byte e a eet ca ty rE a ee 8 fete Prd wot C ieee The PO ee Lakes We eee Se ee Wed Pe 
ne Wo yPatalgenta ty. : ; Ate Printers a] a ae ost? ee eatat ek Phan Piers: a ire 
Tee ted iy “S rt ee a aa Bm Reena as ane! iid 4 Perec te a es 
Pea ae ey yh er) mates Sa) at ee CREE LES Pre ema T LET Pot ee Ee eis Pere 
hiv Reed He Lad ae t roe PNT ead ae tes een ae erat TG 

: als iin" az eg. Su PAT rs bat ace le pve) Ne LPM ptr tia we in RaLnees 
¢ aon ay shat one Me Ren es SFP Ye te OR Ce Pere) Aare Ohelasere Hie Sal uy? Pet {ha Ary 

evo PEA YT Freie eee Se Tatas ora ray Vee RiRy ETE At eso) Cee 

of rhs 


27k Cue rere eee Seeded 
Ta Soteaea ota OP et Fe br bs bie eee Sinise! 





yo Sh thu. tn 

Wiad PAC scAtadstat er adensach Phar ke HA: 
fg en ATLA ta et atte h ata rian here 
Per trey parr ‘ RU Rte Oa: NAR tt CMDR tO ete 
ALES Es 1) a Tet on Hs aes es ee aot 

nia fe“ aL Vache teem Ty Pitot a 4 oat Peon 

ae ae a A re Ray aera 
vd 


a Ditast 
ar ee 
eae yy ae ray oa 
aye Af pase age are F 
A bs Orth reaetae TeRe ty falar he 
H be Cae ae PRaeNt nuees Moet 
te aaa ENN i oy ieee rh eon as 
ail at, eaten ta aa reco ert ah yeh Pmt DF OM Biche: 
om Bearers REL mean hate rei bs pape 
Lith ree! Ore Lat iia ay Ded tin! Lee *. 
RPA ETE elb ier t agen eases eI ae hte 
Th ios rey nt-Gihs Pare et CL eae oes z nd 
rks head at Ts rn rere eA Cay orn tear Cae 
ey rate Cry ad ANAM eEy hy bee ees ak ae ie woe ee 
(hw iH He sped Bite ta8 Heereeyiy iit argh? ol ahi eta, 
a oie ry He? nes Pa? Tee esate 
tS Foe? rten ia ASAE ral ELS OP WEE ENING 0 Bind won”. filo Bas's tad 
agi: PAE any WRU AR EY: c 
i arn BH a ae he 
SET Epa en ba ed 
P Pe De ery ie ae TAL ere aie ser aes 
i Pat et epi ck id bier b Rare pars te ath aes Los 
en Pot Tie ets A eee Se oe tees Or eae wee 
ete aera hb Aer ian AbD Een eae rat 
Can, ne oe ie a a ts entree ne TH n 
i as REE HOHE RRS. tibet a 
apes tt rH rd See io et Ie ‘a . as 
a ay a3 , yi 
eit in rin rats ft 


























as 


Deh £0) ww Spe [ee aaa 


















rd 






a 
ORE 





Bw iz eas Bt stad A 
Peter te eo ge 
Ro cee 

ro Nee J 
\ OCS ern es 










‘» 

Lic aN Fy 

Rerer ona ty Ue ite Te 

Pe ¢ Season eeetae aia? ae 

% sali s 

fo pate! ee 

barre Teen To) Pte bat ly 

Eats ee Narn ara ree) rey 
a erate sie ho Prater 

ria eat ewe Rr 

Died vied ‘S Oe 














ry 


aw 





TE allie 
S 
S- 


oy 
ee 














Se 


wt te 










rn 
Cf ed 





od 


























ay 
oo 








ow ee ee 


7 
* 








foe 





ee 


PE ea 


ae) 
Deas 









ma 





~¢ 
I 


nd 


s 







ah cd 





ae 








ra 












aw os 
=e 
ry 
r™ 


ei ih ke 
<7 
az 





eS 






aan 
J, om) 
a nT 


Sede he 
Q 
> 











4 ts a Peete ware pene 
TLS A cat geet tery ea ti one Pet eer LAS Ter 
Bea 4: ede hid Te bd une Pe ave aed tare tad Frits RR a A ae ee 
ae wih tase ae eahies rota wnt bans obec Lee finsrrd ul bm ty ine 
Beau elem ae Okan Ty Mk sth Sea bad Byer 
ir Sad hs Rater epee Ste Thee PPA, rh a ees A 
Fae arabe tie ok: Fas bape ad Ler Brey ry Sarin Bear ae Fe 
epee ase Bred, rub eee Stark od Aas 
?) s * i etry bei pier 
Pr ae) oi 
RAB G 


pe eee 


ea ay 
ay 


wer 













= Oe 


ae 
id 










er 
oe aa eee) 
ra 









ae 
hg tS 


Perak 
a 


Ed 


we 
ere 
tae 


cea 
el 





















Pa 


iy See 








are 
a es 






> Aa be Mek 
sal 

ees 
Bre 


La 
£3 







Ef 
ay se 


ak 
Sars 








































Td 4 od 
Re Dt ee ea piace Pa Baht 
. toe Sard Pathe at Petar ea ae 
DS EST RIT ETS oe Me eae Ria iy 
ar at ge aoe edd a3 eae ev; Satan 4 
eae y hei he He eibro if SR 

é =f ad A » C 
Ear Sy tot oe Sas yn Pee gy cs 
7 Biss Haart cert eS bad Soba 

a 4.744 é 
enone eae peony Be ut 
ty eri ae | re 
4 VCE 
es oe, eee: 
vin) Far et fetta’ ir Z 
re LOE. he ry iB ks 
hee Soret g fot 
Ltr Pereeys As Rees rite 
tara Rats bt hot TY f 
; Ce ee SPAMS SO kat! 
per Ca iced vit i ee 
re era LF hres 
ne bees kia 
pt 3 pad dy ae Bd 
eit Fi 
A PA Le Le m ; 
Sf af ptt Ta 
rues aera iy Sf 4 Bag ries 
nae PRA ERS CCR eR 
ae Rear rare tinier eb ein 
Be pers He Ln 7ae-€ 
n2 9% Ue ie 


Ba ais Bh i 
eta a 
* is Digs ee 


vn 
- 


















a 
& ak 





be 









—- 
ot 





Sr 
ey 


So 
oete pate 
22a) “abe 


FS 
Oe oe ee 


=“ 
ry 






Peet Sond 


a 


— 
a 













ier st 
pred 
om 

eT) 


Pr 


et he ie 







acs 


go a. 






aa 
a. 





ie iad 


retin 
et 


ee 
y 
we 


x 
a 


r cs 
ee a 





ae eed 


ee 









pene eT 


Cy 
Ld 















a 


Pe eet eed 


Pe et eae 








re 






“i ae 
ory ia) 












4 


aww 


a $ Mehfare® #4. 
rs a2 06,8 Fe PO”, 
ass 





as 
a on 















a 


a ee 





ra 





a 
=) 
Eee 
re 


aa 





eee 








ata nal = <= ou 
Je shag baad 


a 








a 
pee 


Q 
Cs 
os 





el 


















oe | agtiaes 


eect 



































































































































































































































































































































ue Ly 
_ ose 
’ wit he 
Pay 1] LP A 
ra fi sry, aé 
Pate t ee oy Ri 
oy . a fu A ays ae Ze a 
ars ry ° ar) a Fi tr 
5 rere .a ; s34 i f ie Be 
My FI ang a 5 Pky '< ie 
5 ios i « 3 rag sae? bb 
¥ 1 : 4 fit ' me at ae Han thd 
y ee Ve ‘ aa : arog iv} laa te 
* Y ars eae + | 
c ry Per oat Charen Pa i t Ca 
. Po re oy a] 
' ° oO » ra >, a 
ry P, Yay 
4 D - a oe aaa Bey + 
a a re Saas Pd , 
n oy ae < 
a J 5 
vo foo eee Me : 
A 6 5 5 bd oar %, 
’ 4 m Ul Ms A 
' . « ei 
. Va . ' Pf Ful s 
Oe BE aabke ta en oe ie ea 
ar Z oe ’ ay ek vl See Vast 
- ee CY ep Pies 
o r F a . WY Os! yee 
¥ . € BPN Wee aa ee ra 
j a Md be LU sa Ue eee ee a en i 
° . am | ED {POU IIS h a O Fi 
a Cr! ay ; rs iat wh ie 
. P oO - r Piet Sah et) 1 
‘ r e Tie el ach id 
* Pr at ie) evi? ' Tee 
: he Fy tay | Age ery Pat iry i ne 2 tS 
r] a 5 A a) 3 ‘ ¥ 
ny , - ‘. } Ar’ Hey std i) f oe ae 
‘ ay ‘ a be * NE} ese pe 
Py ’ aa : “a 
a ’ ty bd e- 3 ta patie: 
LU ’ arn ‘ $ a ar gk 
F A ' A D F Bai 
S ae 4 5 + 
2 ‘ Cha A vor - Sb *e 
" rs Ar rs <3 
A i 2 ) 
a 5 ae. y je 
‘ « J ea a! 4 Le 
‘ a 4 U3 sat | Ps 
ny P +4 5 vo By fi 7 ‘an 
‘ Hiden 2 7 Fees wie eo °C) ee haan as 
F a AP] ay 7 te Mae el 
, . aay piles priya ite: 
re wiht ah dh Coa =o Tha rita ‘ 
o om rr ieg= ‘ 
Ms , o Pt a ie ’ ¥ ae MU yy te ry sie i pore 
. ee ee Pac 2 ae ee ee | ee Fe o erast DS 
% Y U = ber ; sar PUL be: ens og mPa td 
, ; : ; Rar Pn ge ps i eae yet 
a » * fyag! a eye : 
: , 2 - . hans Mt aes 
| Ae as aR 
" es a é 
5 F : j ae re wy ae eae gee 
OU » z ; a i Ai ; ah ss wie Fac tea anil Fa Ba 
a , 5 ‘ on a y ; ef dey oi ee 5 ; poet Bh b heyy CRY ‘ a taunt 
’ : K . : ‘ : — Tin Bp yd ie Paras PS Ss ass .~ bb 
i ; . ces a Ee peg wid shies Hon 
A F F Ce Pon her th aie 
5 z a ee Fa ps Gd a! it 
: ‘ De Ms A a 1 aa aes sa ahead © S Pa oo an 
1 > Raa Ue Ss * Peele \ a ma mt see. SRR 
ar re m rn 4 a ves Pee mat at amare thie) aah ee a at 
Tae ont: # Eg Coe a vs iy chal Roan Laem epee aer pagans 
; ! MF : rae en rte re ee reed aay eet Ks Ries 
CRUSADES alge ytyy eye 
A DOM ok he tn 
om a 4 af Meat at Sree Ar Hee : ooh Ses eee 4 
F ers A oe ' fais bat Ae SD we eg its ee iene wT e oes 
5 : P, . re os ‘ pots p 
. 1 5 ee eri mi me 
nn 5 P s 7 bei ae Sa 
5 Pe meh fk Tan pe 
a « s W rig: & E yt R sia ke we Pert 
atten Lay gra oe Dy 4 ~ Pel Pe oNe cs) SRR lob, 
Poe ur . wey RRS PHT = i er Fi wi PS ee lity: Sevag Nae 
eva Rae eH Aa 4 ¥ ook 4 hha Wale ag. eas ppt 
rs Pie Oro ly ra iT aoe rey h 
: 2 a easy Jf REO ES Fe is eda) See sph aap pr ty x: os eRe BS eh 
a 3 Bot ; Ee OS eo ea Ape Oa ae | Ag aM Apes 
; i a Ro Bae M eri fa nee eet fs en eyes Cast 
a iy. 4 } ot a if 
od ar] Lat ig 
£ F 7 " : 2 4 eae a $ ee 4] Ree Hy SY Sead ae 
bd id a a) ] alae a rt RL et ieee) RK iu re 





cori st? 


PrNLey Kwek Lib 
Rava. PORTEREDUBTE scHNOL 
MONTEREY, Op LIFORaih 06808 92 











NAVAL POSTGRADUATE SCHOOL 


Monterey, Galifornia 





THESIS 


A SERIAL BUS ARCHITECTURE 
FOR PARALLEL PROCESSING SYSTEMS 


by 


Kevin J. Delaney 


September 1986 


Thesis Advisor: carry WwW. eAbbot t 





Approved for public release; distribution is unlimited. 





nm * 
eo | 
ios. 
~ | 


SECURITY CLASSIFICATION OF THIS PAGE 
REPORT DOCUMENTATION PAGE 


la REPORT SECURITY CLASSIFICATION 1b. RESTRICTIVE MARKINGS 
BNCLASSIFIED 
2a SECURITY CLASSIFICATION AUTHORITY 3 DISTRIBUTION/ AVAILABILITY OF REPORT 


EepEoved ror public release; 


2b DECLASSIFICATION / OOWNGRADING SCHEOULE Paste bub1On 1S Unlimited. 


4 PERFORMING ORGANIZATION REPORT NUMBER(S) 5S MONITORING ORGANIZATION REPORT NUMBER(S) 


6b OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION 


6a. NAME OF PERFORMING ORGANIZATION 
| (lf applicable) 
OZ 






' Naval Postgraduate Schoo 


| 


Naval Postgraduate School 


6c. ADDRESS (City, State, and, ZIP Code) 7b. ADDRESS (City, State, and ZIP Code) 


Monterey, California 93943-5000 M@geerey, California 93943-5000 


8b OFFICE SYMBOL 
(if applicable) 


Ba NAME OF FUNDING/ SPONSORING 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER 


— ORGANIZATION 







8c ADORESS (City, State, and ZIP Code) 10 SOURCE OF FUNDING NUMBERS 


' PROGRAM PROJECT TASK WORK UNIT 
| | ELEMENT NO NO NO ACCESSION NO 





11 TITLE (Include Security Classsfication) 
™ A SERIAL BUS ARCHITECTURE FOR PARALLEL Pe OGhoo LNG Sys LEMS 





12 BSC NAL AUTHOR(S), 
Pieanmey , Kevin J. 


F TYPE OF REPORT ; 13b TIME COVERED 14 DATE OF REPORT (Year, Month, Day) f1S PAGE COUNT 
Baster’s Thesis FROM TO 1986 September 64 


‘6 SUPPLEMENTARY NOTATION 


7? COSATI CODES 18 SUBJECT TERMS (Continue on reverse if necessary and identify by block number) 


arc 
~~ hLrti(‘—SSSYSCS Psp lel processing, Optoelectronic Multiplexer 


9 ABSTRACT (Continue on reverse if necessary and identify by block number) 


One of the most serious deterrants to the development of multiple 
processor architectures has been the problem of providing adequate com- 
munication between the discrete processing elements. This paper examines 
two communications-based constraints. 
| The first constraint is related to the physical structure of the VLSI 
chip. The wider the communication path the more pins are needed to effect 
mene data transfer. As Integrated Circuits grow in computational power, 
-more communication capacity is needed, pushing designs closer to the pin 
limitations of the packaging technology. 
| The ‘second constraint, somewhat related to the first, is the limited 

speed with which data can be transmitted via internal channels. TyoLcal 
Beeces One Can achieve on a single wire are on the order of 1 Gbps. The 


eee 


0 O'STRISUTION/ AVAILABILITY OF ABSTRACT 21. ABSTRACT SECURITY CLASSIFICATION 
EXUNCLASSIFIEDVUNLIMITED (J SAME AS RPT CJ OTIC USERS UNCLASSIFIED 


2a NAME OF RESPONSIBLE INDIVIDUAL 22> TELEPHONE (include Area Code) | 22¢ OFFICE SYMBOL 
Absott, Larry W. 408-646-2379 62At 


D FORM 1473, 84 MAR B3 APR edition may be used untilexhausted SECURITY CLASSIFICATION OF THIS PAGE 
All other editions are obsolete 


) i 


er a mg pT 
SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered 


LO CCOnEI nue) 

recent development of an Optoelectronic Multiplexer may allow 
VLSI chips to communicate at rates up to 7/7 Gbps. SAneaceuue 
tecture for a parallel processing computer which takes ad- 
vantage of this new capability is presented. The feasibility 

of a single-chip parallel processor based on the Optoelectronic 
Multiplexer is examined by projecting current trends in process= 
or speed, power, and transistor count into estimates of through- 
put for a multl=precessoraee 


S N 0102- LF. 014-660} 


SECURITY CLASSIFICATION OF THIS PAGE(When Data Entered) 


Approved for public release; distribution is unlimited. 
A Serial Bus Architecture 
for Parallel Processing Systems 
by 
Kevin J. Delaney 


Lieutenant, United States Navy 
B.S.E.E., United States Naval Academy, 1979 


Submitted in partial fulfillment of the 
requirements for the degrees of 
MASTER OF SGIENCE IN ELECTRICAL ENGINEERING 
and 
Bere lTRICAL ENGENEER 


from the 


NAVAL POSTGRADUATE SCHOOL 
September 1986 


ABS Pike weal 


One of the most serious deterrants to the development of multiple processor 
architectures has been the problem of providing adequate communication between the 
discrete processing elements. This paper examines two communications-based 
constraints. 

The first constraint is related to the physical structure of the VUSI chip aime 
wider the communication path the more pins are needed to effect the data transfer. As 
Integrated Circuits grow in computational power, more communication capacity 1s 
needed, pushing designs closer to the pin lmmitations of the packaging technology. 

The second constraint, somewhat related to the first, is the limited speed with 
which data can be transmitted via internal channels. Typical speeds one can achieve 
on a single wire are on the order of | Gbps. The recent development of an 
Optoelectronic Multiplexer may allow VLSI chips to communicate at rates up to 7 
Gbps. An architecture for a parallel processing computer which takes advantage of 
this new capability is presented. The feasibility of a single-chip parallel-processor 
based on the Optoelectronic Multiplexer is examined by projecting current trends in 
processor speed, power, and transistor count into estimates of throughput for a 


multi-processor IC. 


maT 


ToDEE OF CONTENTS 


TUN CIPdie 0 1 te 1) cree EEG hs. UMN ss kk ee ee 10 
Rew Nice Oheewk ALLEL PROCESSING os 2... ces ceca es 10 
Pe ee ee Re eESSORS DEPEND ON 
Oy lO Ne Oar ee eo ws dw eels vo as 11 
[ee in ee OMIA ICATIONS Ga sates. et ce ec bee ee we 1] 
Pee GUO MMIMIMICALIONS UE dies sa e4 besa ee yee ee ee 12 
Cee ieee ie meee IONIC MUEMIPLEXER CONCEPT ........ Is 
PO ple dlesmmuchtmeny IclOS EMG SPeCG 22.5... snc. ea ee eee ee 13 
Pa elle m MCIMNITCCHUTE SOUS flies. ies ae ee ee ew ek es 14 
eee rete cececnes 17 
A. PARTIONING SILICON FOR MAXIMUM 
Alea Oe ne Ole cs Pee rc Saw ee eee ee ly 
PCS ROMOIe OnStediMtS 565 esa a a eG vases ie a ee ee. 18 
PON VOTO OMS CraNiCs 1 \ ob ay see ie ada bad eS ee ke Se eee 29 
Boe lteviee M CiliP SIZE FOR OM APPLICATION ............ 28 
[Pee MMM rcIGIStOy COUN Gi... ss. aes cay ese se ew suse ew nee 28 
OPee Mian Wer IDISSIPAUON 3640004 4G er ee ee eed ee ew es BZ 
TNE Oe Ni iGia-oP ee MULTIPLEXER................ 34 
MeeNOCr soon POWER LIMITED BY 
COPIER SOe TOONS BN (ieee 34 
PUMMOMICATION ....2.-- rece wore ences sadeeeeeecee eee. 37 
RO MeO Nr ILOw ARCHITECTURE 2.55.4 oe. wc i ay 
MPMI CHINGw A COIMMEGCEUILC! ccc a vorsiklse sivas hots vi kueCae de Sok acai el aie one 30 
PrN SUS ee WRCMMGCCEUING. Heirs aks Gs Sew Sb aco Se datas hwo eee He ke 38 
SME le Cr aomID EN SEES as... asc Sued be Se he eae dle Sey bbw oe 46 
ep mn SMR Ma) 1 FOUN, Agee le Se ge wade ee ee 50 
[Pipe Mne@e MneIMIKECUUTE slurs eqn ¢cia eo ein es fae eke vw la eee ls 50 


2. Reuse Architecture ........ cue een Si 


Cc, RECEIVER TASKS ........ «« oe puss ene ane 53 
D. TRANSMITTER TASKS (273333) ee 55 
E, CONCLUSIONS AND LIMITATIONS GR Wais 
RESEARCH .....0. 0060000 00 cs 6 eee 58 
1. Conclusions ..........05 + + + sy eenenennengnnn 58 
2. Limitations and Recommendations .7)2)2).2...20 ee 59 
LIST OF REFERENCES ............. see see cen en 60 
INITIAL DISTRIBUTION LIST ....2..-2.2.... ee 63 


i. 
ITT. 


VI. 


wil. 


elt. 


Xi. 


LIST OF TABLES 


See@lmneAtlONS CE SOME ACTPUAI PROCESSORS (22.60.0004 
Pp erate iyi N LAGI SMO NIN TS SUR tes be She ee ea 
SE Cue Al OMS OF SOME AGRUAE PROG@ESSORS .:....4...5. 
Pee Ore VEIN PANN ONAN S URE: cei. a ce ee ee 


INTER-PROCESSOR COMMUNICATIONS 4 X 1 REUSE 

Meee CMe eee, EMEC he a ess cle a ke eG the ee ee 
INTER-PROCESSOR COMMUNICATIONS 2 X 2 REUSE 

PEAT Ole a ite OHO DN sos ce Ve I eH) Fe is ie isin e ee ea ee ee oe 
INTER-PROCESSOR COMMUNICATIONS MODIFIED 4 X 1 

eee Oye ote (Olea COU ee ee ee Vek cba we ee es 
PRESET SCHEDULE FOR DATA DISTRIBUTION PIPELINE 

Oe Oeste Le lhe Or ew eet ey he chose Gwe k's Ws Ge dows 8 ow ey 
PRESET SCHEDULE FOR DATA DISTRIBUTION PIPELINE 
EO en U ee Ne ee ere Pew Ne ab GE ek ois se ERC eek es 
PRESET SCHEDULE FOR DATA DISTRIBUTION REUSE 

POL NaCI GUn OIE POPU se pero coc am OI Fea Gy ce EN ae cae a 
Five oCnE MULE POR DALLA DISTRIBUTION REUSE 

rave COME MU Le) Ott Owls wert) Saas ee alae Ge. » OEMs ee 


Pl 
eZ 


“a 


ee 
1.4 
1.5 
ZI 
Dee 


2 


2.4 


4.1 
4.2 
4.3 
4.4 
4.5 
4.6 
4.7 


4.8 


4.9 


4.10 


Att 


LIST OF ae G evi 


Exhaustive Communicatons .... 2420. 2242). 
Limited Communications--Dedicated Path Loop .---7........ saa 
Limited Communications--Dedicated Path Regular Network ............ 
Limited Communications--Shared Path ........ 9s 
Optoelectronic Multiplexer Block Diagram ©. 7777s 
Processor Speed and Complexity (Experimental) (ee 


Processor Speed and Number of Processors Based on a Constant 
Number of Transistors ...... 250050 «ore « «oe rennet a 


Relationship Between Processor Capability And Svstem 
Requirements (Speed a Strong Function of Complexity) ....7... 3) ee 


Relationship Between Processor Capabilitv And Svstem 
equirements (Speed a Weak Function of Complexity) 722 


Processor Speed and Power (Expermental) .° .. .23))2 eee 


Processor Speed and Number of Processors Based on a Constant 
Chip Power Level ... 00 60 ues os cn aa on ee ee 


Relationship Between Processor ea And System 
Requirements (Speed a Weak Function of Power) 73. eee 


Sixteen Point Fast Fourier Transform .....- 2: 2 sere 
Sixteen Point Fast Fourier Transform Piphne Implementation (>. >= 
Sixteen Point Fast Fourier Transform Performed by 4 X< 1 Chips) yes 
Sixteen Point Fast Fourier Transform Performed by 2 * 2 Chips {2 
Sixteen Point Fast Fourier Transform 4 X | Reusé Architecture 72am 
Sixteen Point Fast Fourier Transform 2 <* 2 Reuse Architecture = eee 


Sixteen Point Fast Fourier Transform Modified 4 X 1 Reuse 
AICHitecture 6.6 6 cc ce be eee we eo em we) Wen ee 


Sixteen Point_Fast Fourier Transform Modified 4 x 1 Reuse 
Architecture Non-Interleaved ......... «ss ssc) 


Sixteen Point. Fast Fourier Transform Modified 4 X 1 Reuse 
Architecture Interleaved ... 0... 040 054 0 « se) seinen einen 


Sixteen Point Fast Fourier Transform Distributed Among Four 
Multi-Processor Chips ...... 6056. .50 6-4 oy oe Gis y ele ena netnnenn 


Sixteen Point Fast Fourier, Transform Distributed Between Two 
Chips Using a Reuse Architecture ..... 22.0.9.) ee 


4.12 
4.13 


Receiving Bus Interface Unit Architecture ...... 6.6... eee cee cece nee 


Transmitting Bus Interface Unit Architecture ......... eee e eee eeees 


I. INFRODUCTION 


Farmers once,used oxen to plow their fields. And when the. task got_too big for 
one ox they did nor try to grow a bigger ox. They got two of them? [Ref. 1] 


A. THE NEED FOR PARALLEL PROCESSING 

So too have we often found that one computer is not enough, or at least, not fast 
enough for many applications. While progress on producing faster single processor 
computers continues, it is the orders of magnitude leap in speed possible in 
multiple-processor computers that promises to lead computing into its Fifth 


Generation. 


Multiple-processor computers became] necessary because_a limit to higher speed 
ad been reached with brute-force approaches employing faster switching devices. 
Faster components made with gallium arsenide or Josephson junction devices can 
increase computer speed only 10 times if current uniprocessor architectures are 
used; however with the new architectures, there 1s hope of increasing speed 100 to 
1000 times. [Ref. 2] 


Such dramatic increases in computer speed would be of great benefit to 
researchers working on computationally-intensive and/or real time problems such as 
adaptive antenna control, weather prediction, or fusion reactor design. It 1s not merely 
a question of having the answers in seconds instead of minutes--once machines can 
perform calculations in real time, whole new applications suddenly become possible. 

As an example, consider a computer system which calculates the power spectral 
density of intercepted radar emitters. A system which takes an hour to analyze a few 
seconds’ worth of data may be useful to compile electronic intelligence data back at 
fleet headquarters--it produces answers long after the event is over. However, if the 
system could perform its analysis in real time it could be used onboard ship or in an 
aircraft to recognize hostile missile seekers and dispense chaff or activate jammers--that 
is, to respond to events as they happen. Increased speed alone could make this new 


application possible. 


10 


B. PARALLEL PROCESSORS DEPEND ON COMMUNICATION 


x 


When using a number of processors on a single problem, the exchange of data 
between processors becomes a critical bottleneck. TRef: 3 
Extensive research has already been conducted in many areas related to parallel 
processing, such as task distribution and software development. The research reported 
in this paper focused on the architecture of parallel-processing systems, especially with 
regard to inter-processor communications. | 
A system which uses more than one processor to perform a task must provide 
communication paths between the processors. There are essentially two approaches to 
this requirement: 


® provide a path from every processor to every other processor--”exhaustive” 
communications 


® provide paths, between each , processor and only some of the other 
processors-- limited” communications. 





bd 


Figure 1.1 Exhaustive Communicatons. 


1. Exhaustive Communications 
An exhaustive communication architecture (Figure 1.1) provides direct data 


exchange without bus contention or waiting. However, as the number of processors 


Pl 


rises, the number of communication paths in an exhaustive architecture becomes 
impractically large, leading to high costs. In addition, expansion of the network may 
be limited by the inability of the existing processors to accept another communication 
port. These difficulties with exhaustive communciation architectures have led many 
researchers to consider architectures based on limited communications. 
2. Limited Communications 

In limited communication architectures, [Ref. 4] identifies two major groups: 
dedicated path and shared path structures. Limited architectures employing dedicated 
paths enable a processor to exchange data without bus contention or waiting, but only 
with a limited number of processors. Figures 1.2 and 1.3 show two examples of a 


limited communication architecture employing dedicated paths. 





Figure 1.2. Limited Communications--Dedicated Path 
oop. 

Parallel-computing systems built around a limited communications-dedicated 
path concept can take advantage of the immediate communication between a given 
processor and the processors adjacent to it. Yet if a problem requires communication 
between non-adjacent processors, the message must be passed along by all the 
intermediate processors. Should the message reach a busy node, it may be delayed or 
even discarded, forcing a re-transmission. The resultant communication overhead 


could tle up the system and severely slow its operation. 


RZ 





Figure 1.3 Limited Communications--Dedicated Path 
Regular Network. 


Using a shared path (as in Figure 1.4) eliminates the need to relay data from 
one processor to another, because an uninterrupted path already exists between any 
two processors. For this reason, limited shared-path architectures are more flexible in 
the kinds of data flows which can be achieved and in the types of problems which can 
be solved than limited dedicated-path architectures. However, because processors must 
wait their turn to use the common communication path, system throughput may suffer. 
That is, unless the common bus runs at such a high speed that the processors can 
barely keep up with the bus. Such a high speed bus design would require a multiplexer 
on each chip capable of speeds considerably in excess of the speeds associated with 
conventional multiplexers. The Optoelectronic Multiplexer (OM) developed by the 


Naval Ocean Systems Center, San Diego, is such a device. 


C. THE OPTOELECTRONIC MULTIPLEXER CONCEPT 
1. Optical Switching Yields High Speed 
The Optoelectronic Multiplexer employs optically-activated junctions to 
sequentially link parallel data lines onto a serial bus. [Ref. 5] A laser pulse, fed to the 
junction by optical fiber, activates the junction, allowing conduction from the input 


line onto the main data transmission line. By using a different length of optical fiber 





Figure 1.4 Limited Communications--Shared Path. 


for each junction, the laser pulses will arrive at the junctions at different times. 
Consequently, the junctions are activated one at a time, which converts the parallel 
data waiting on the input lines to serial data pulses travelling along the output 
transmission line. The short pulsewidths generated by the laser allow extremely high 
pulse repetition frequencies--researchers have tested a prototype laser multiplexer at 
speeds as high as 7 Gbps. [Ref. 5] 
2. A Suitable Architecture Sought 

Current research [Refs. 6 - 10] is especially rich in_ parallel-processing 
architectures based on limited communication dedicated-path concepts, because shared 
path communications typically involve delays which could detract from the high 
performance otherwise achievable by parallel-processing designs. Prompted by the 
development of the high-speed Optoelectronic Multiplexer, which promises an increase 
in serial communication speed of at least one and perhaps two orders of magnitude, 
this project evaluated-the impact of using a shared bus and serial communication in a 
parallel processing computer architecture. Specifically, the following questions were 
posed: With current technology, is it feasible to fabricate an Optoelectronic 
Multiplexer-based multiple processor chip? What new architectures are made possible 
by the OM’s high speed? Which architecture makes optimum use of this new 


capability? 


es) ae S 
NOiSS TiNeNOeHe 


Ety 


HANNE S 





ele ete el 
lel LiiO 









doo] DNIWIL 


' 


ee - 2) LLa@ acl Sarl 


Figure 1.5 Optoelectronic Multiplexer Block Diagram 
Pcie 


15 


Four conditions would have to be met in order for a single-chip OM-based 


parallel processor to be feasible: 


IC manufacturing technology should be able to fabricate enough transistors on a 
single chip to create a multi-processor chip. 


A large chip partitioned into many processors would produce higher throughput 
than the same chip fabricated as a large uniprocessor. 


Chip throughput (measured in bits per second) would exceed the capacity of 
conventional multiplexers, justifying the use of the OM. 


The package of such a multiple processor chip would require so,many pins that 
package size would be excessive and a multiplexer would be used instead. 


The first condition 1s easily dealt with by a specific example. The Intel 8080 


microprocessor contained about 4500 transistors [Ref. 12], while Motorola’s MC68020 
contains about 200000 [Ref. 13]. Using the technology of the Motorola MC68020, one 


could produce a chip with over 40 Intel 8080s. Clearly, manufacturers can already 


fabricate a multiple-processor chip. The remaining points require further discussion 


and are covered in Chapters II and III. 


16 


Il. OPTIMUM ARCHITECTURE OF LARGE INTEGRATED CIRCUITS 


Chapter I’s demonstration that a multiple-processor chip could be fabricated 
prompts the following questions: 


e Is a multiprocessor chip the best use of [C fabrication technology, or should all 
available transistors be assembled into a single processor? 


e How large (in terms of transistor count, heat dissipation, and number of 


Oe would a chip have to be in order to justify the use of the 
ptoelectronic Multiplexer? 


A. PARTIONING SILICON FOR MAXIMUM THOUGHPUT 

Should designers divide the available silicon among a few large and capable 
processors Or among many, less capable processors? Which mix yields the highest 
throughput? 

Consider a system of N processors, each executing the same program and 
producing the same number of output data words each second. Applications of such 
architectures abound in the field of real time signal processing, which uses regularly 
structured algorithms. As N increases, processors share the load, so each may run 
more slowly without changing the speed of the system. If we imagine a system 


throughput goal of R bits per second (bps), then: 
R = NS (eqn 221) 


where R= System throughput (bps) 
N= Number of processors 


S= Throughput of each processor (bps). 


Sreq'd = RN7 (eqn 2.2) 
where Sreq'd= Speed required of each processor 
in order to meet the system goal of R bps. 


These equations describe what is required of a processor--but how does a 
processors actual performance vary with N? At issue is the apportionment of the 


entire chip’s allotment of transistors and heat dissipation ability among N processors. 


1/7 


|. Transistor Constraints 


Assuming we can put only so many devices on a chip, then: 
t = TN”? | (eqn 2.3) 


where t= complexity of any processor, measured in transistors 
N= number of processors 


T= Total number of transistors on chip 


Generally, a complex processor will be able to perform a given calculation 
faster than a simple processor. For example, a microprocessor with an on-board 
floating-point unit can handle a multiplication in a few clock cycles, while a smaller 
processor has to do tedious successive additions, requiring much more time. But what 
is the exact relationship between processor complexity and speed? To answer this we 
shall examine the specifications of some existing processors, as listed in Table I and 


graphed in Figure 2.1. 


TABLE I 
SPECIFICATIONS OF SOME ACIUAR TRG GESSOre 
Reference Data Word Time Required for Bit Transistor 
(Bits) Multiplication Rate Count 
(107° sec) (10® sec™+) (thousand) 


Pt pet pt 
OO 
NON 


— 


CRs 
1982-85 


09 OI IOI Ll 
JO—$ ©-—C1 
— ho 


ANNA CNPNNN NAD 


4 
5 
6 
fi 
8 
9 
0 
| 
Z 


INABICNGINY 


ae = em ot 


DUWWUin 
n 


NNN 


= COnniaea = aa 


Bo slice ocho 


rN -— = 


DIDIDININIHY 
COMIN Un £42 

LE nWNN— DL 

DOO DS Hoh 





PROCESSOR SPEED AND COMPLEXITY 


niguie Zl 





OT oT 
(Sdd) daqdS 


Processor Speed and Complexity 
(Experimental). 


19 


COMPLEXITY 


From the experimental relationships between processor speed and complexity 


shown in Figure 2.1, we can see that the data in each group are approximated by the 
equation: 


Sproc = At* (eqn 2.4) 


where Sproc= processor speed (in bps throughput) 
t= processor complexity (in number of transistors) 
A=empirical constant of proportionality given in Table II 


a=empirical constant given in Table II. 


TA at 
EXPERIMENTAL CONS TAN Ts 


Group A a 
CPU 81-82 6.69 102, 0.783 
CPU 82-85 5.16 * 102° 2.07 
FPU 4.22 x 10 0.711 


Equation 2.4 describes how, in some typical one-processor systems, processor 
speed is related to complexity. To apply these findings to a N-processor system of T 
transistors, we combine equations 2.3 and 2.4: 
Sproc = NUN Sale (eqn 2a) 


Sproc — A Nes 


Sproc 


a 


where Sproc= processor speed (in bps throughput) 
t= processor complexity (in number of transistors) 
N= number of processors 


A and a are constants given in Table II. 


20 


A family of “processor curves” may be used to describe the tradeoff between 
individual processor speed and the number of processors, constrained by a constant 
number of transistors. The tradeoffs are shown in Figure 2.2. For example, consider 
the curve labeled “CPU 82-85,” which is based on a constant 10° transistors per chip. 
If these transistors are divided into 10 processors of 10° transistors each, Equation 2.4 
predicts that each will produce about 116% 10° bps of output. But if the chip is 
divided into more (for example 25) processors of 4 X 10* transistors each, then these 
less complex processors will be capable of only about 17.3 x 10° bps each. 

When we superimpose these processor curves (Figure 2.2) with a family of 
“system” curves, generated by choosing several values of “R” in Equation 2.2, the result 
(Figures 2.3 and 2.4) yields a strategy for choosing N. Where the processor curve 
(describing what the processor can do) intersects the system curve (describing what 
each processor must do ) determines the number of processors (N) into which the chip 
should be divided to yield that particular level of system throughput. For example, to 
achieve a system throughput of 10? bps, Figure 2.3 shows the chip should be divided 
into about 12 processors (point A). Yet choosing to partition the silicon into fewer, 
larger processors (point B) yields a higher system throughput of 2 x 10? bps. 

In general, when processor speed 1s a strong function of complexity, that is 


when: 
Sproc = At* With a > 1 (eqn 2.6) 


then Sproe is proportional to N~* (a> 1) while Sreq'a is proportional to N7?. Thus, 
Sproc falls faster than Sreqd as N increases. In this case, the highest performance will 
always result from choosing the lowest N possible, in other words N=1. This strategy 
may be constrained for very large values of T--there may not be a processor design 
which can effectively use 10’ transistors, for example. Also, the optimistic relationship 
of Equation 2.6 may not hold for large values of t. 

On the other hand, when a weak relationship exists between speed and 
complexity, as shown in Figure 2.4, the best strategy is to select N as /arge as possible. 
As before, however, there are limits to this rule. It may be impractical to divide the 


computational task beyond a certain point. For example, a 256-point FFT probably 


21 


PROCESSOR CURVES 





O'V 


Fiagures2.2 


Based on a Constant Number of Transistors. 


ove 0-2 OT 
dOSSHO0Ud AO daaddS 





0°0 


Processor Speed and Number of Processors 


ZZ 


60 80 100 120 140 160 ~~ 180 
NUMBER OF PROCESSORS N 


40 


av 


(N) SHOSSHIOUd AO UAAWAN 
OG OF OF Oc OT 0 


—_——_ oe eee ele aaa mao 
a ee 


_— — —— on, — « 


me ae 
SA ees, animes) Te, 


= 
—~ om, 
et 
— 


HAUNO YOSSHOOUd 
QNAHOA 1 


O'V 


beta 





S8-c8 Ndd 


‘ome 
(0 T+ 


SdOSSAIOUd AO WAEIWON SA CHadS 


Figure 2.3. Relationship Between Processor Capability 


And sysicm Re 


uirements 


(Speed a Strong earnieintal of Complexity). 


ae 


(N) SHOSSHOIOUd AO UAUWAN 
OCT Col 007 CL OG 


YWOSSHIOUd 
QNAOUT 


Go 





ulrements 


q 


Relationship Between Processor Capability 
And System Re 1 . 
(Speed a Weak Function of Complexity). 


Figure 2.4 


can not be efficiently shared by more than 128 x 8 = 1024 processors.! Also, as N 
increases and t decreases, processors will eventually become too simple to function as 
microprocessors. For example, excessive reduction in processor complexity could yield 
a circuit unable to retain a data word or perform a basic calculation. 
2. Power Constraints 
Each chip can only dissipate a given amount of heat. The power available to 


any individual processor 1s: 


p = PN”? (com 2.7) 
where p= Power available to any one processor 
N=number of processors 


P = Total power available to the chip 


TABLE III 
SPECI ten PONS OF SOME ACTUAL PROCESSORS 
Kelerence baa Time Required for eee Power 


Its Multi a le 
- sec) (oes sect) (watts) 


pens pnt pe pees pe} pe 
WVANON WW 
’ 


SOSOuUS 
—Ogjone 


tae ce 
MIOSSOCNSAN COMNSSO 


i On 


pod ome (yD pet (p> pee emt ped pemeent peed Cp OO CD eet CD 
oc 


ADNIANAHAAACN NNNNIAHL 
SSSOSSSSSS NNSr-— 


f. 14 
f. 15 
f. 16 
foe 
f 18 
fag 
es) 
f2o 
fo 
feo 
ios 
fae 
f. 30 
al 
ioe 
f. 33 


No ntl Dee ee ee Co 


Gn 





There are 256+2 = 128 processors per stage and log,(256) = 8 stages. 


25 


Examining the relationship between processor speed and power in the light of 
data from actual processors, (Table III] and Figure 2.5) there in no clear trend evident 
in Figure 2.5. In particular, there is a great deal of scatter in the CMOS multiplier 
chip data. This may be due to differences in the way researchers report power 
dissipation data; for example, some may report only the power consumption of the 
computational segment, while others report the power used by the entire chip, 
including bus drivers. In spite of these limitations, one interpretation of the 


power/throughput data is: 


Sproc = Bp? (eqn 2.8) 
where Sproc= processor speed (in bps throughput) 
p = processor power (in watts) 


B and b are empirical constants given in Table IV. 


Therefore, combining equations 2.5 and 2.8 as before: 


B(PN7*)° (eqn 2.9) 


Sproc 
Sproe = BPoN® 
Sproc = KNee 


Where Sproc= processor speed (in bps throughput) 
p= processor power (in watts) 
N= number of processors 
B and b are empirical constants given in Table IV. 
Figure 2.6 shows the relationship described in equation 2.9, namely, the 
tradeoff of individual processor speed against the number of processors, constrained 
this time by a constant power level, as required by equation 2.7. Since, for the group 


of actual processors examined, 
Sproc = Bp? with b < 1 (eqn 2.10) 


Figure 2.7 shows that the best strategy is to select N as /arge as possible. 


26 


(SLLVAM) MAMOd 





OT 0} OT 01S 
Adi SOND o ' 
Ndd SONN 
CINOOTT 
“San 
a0! 
r= 
(ry 
Ss 
ned 
a2) 
7 
2 
OC: 
; = 
o) Senos 
—— con sete ° 
" 
—— 
OQ, 


GHAMOd ANV GttldsS YOSSHIOUd 


peed and Power 
(Ixperimental). 


Processor S 


Piguee2 


2/ 


TABEE WY 
EXPERIMENTAL CONSTANTS 


Group 


NMOS cpu’s 
CMOS fpu’s 





B. MINIMUM CHIP SIZE FOR OM APPLICATION 
How large (in terms of transistor count, heat dissipation, and number of 
processors) would a chip have to be in order to produce sufficient throughput to justify 
the use of the Optoelectronic Multiplexer? 
1. Minimum Transistor Count 
Assuming the individual processors are of low complexity (like the FPU group 


of Figure 2.1) implies that: 


Sproc = At® with a < 1 (eqn 221cm 
where Sproc= processor speed (in bps throughput) 
t= processor complexity (in number of transistors) 
A=empirical constant of proportionality given in Table II 


a=empirical constant , here < 1. 


For this group, the discussion in the previous section shows that the 
maximum throughput is achieved by partioning the available silicon into the largest 
number of processors possible, limited by the minimum complexity of the simplest 


. e 
processor design.* Therefore: 


Noa Sees (eqn 2.12) 
where T= total number of transistors on chip 
t_. =complexity of the simplest processor design, measured in transistor 
min 
N vax = Number of simple processors possible on chip of T transistors 


“While the components of systolic arrays are less complex than the assumed 
simplest processor, this research did not study the performance of such 
ICs--accordingly they are not considered here. 


28 


PROCESSOR CURVE 





O 
= 
5 
= = 
O 
in 
= 
=, 
7 
© 
nN 
= 
eee . 
0°Se 0°02 OST 0°01 oc 0°0 


OT*  (Sdd) YOSSADOUd AO AAAdS 


Figure 2.6 Processor Speed and Number of Processors 
AsceuOned Constant Chip Power bevel. 


Zo 


R OF PROCESSORS (N) 


4 
4 


4 


NUMBI 


(N) SYOSSHIOUd AO WAAWAN 


(NAOT 


Nd SOWN 





SAdOSSHOIOUd HO UATEINON SA Ca 


~e 


ion oO! Power. 


ulrements 


ct 


Relationship Between Processor Capability 
‘And Svstem Re 


(Speed a Weak Fun 


PreuLer. 


30 


Since each processor produces an output of eres bits per second and there 


are N__ processors, the system throughput 1S: 


=N (eqn 2.13) 


sysmax max proc 


—N UoAL. (eqn 2.14) 


I 
a 


ia) eee (eqn 2al>) 


= TAt . 377 (eqn 2.16) 


Defining S, to be the minimum system throughput for which use of the OM 


is justified leads to: 


S,, = TAt,,,3"* (eqn 2.17) 


oO in 


fier = Sit. aja” (eqn 2.18) 


min min 


Where t_. = minimum number of transistors on chip for OM usage to be justified 


To estimate the value of ae assume: 


Datel 6 10? transistors (lower end of FPU group in Table I) 
A=4.22 X 10° (Table II} 
a=Q.711 (Table IT) 

See ors 10? bps (Curently the upper range of 


conventional multiplexers.) [Refs. 34,35,36,37] 


Sil 


Therefore: 
T . =92000 transistons 
min 


N___ = 13 processors 
max 


Thus, since processors with transistor counts > T_.. are already in existence 
[Ref. 21], it seems that an OM-based single chip multiple processor 1s feasible with 
respect to the number of transistors required. 
2. Minimum Power Dissipation 
What is the minimum heat dissipation of a multi-processor chip which would 


yield throughput in the OM range? 


N.. = Pp...” (eqn 2.19) 


where P= Total power dissipation of the chip (watts) 


Pin power used by the simplest processor design, measured in watts 


N__. =number of simple processors possible on chip of P watts 
max 


=N_S (eqn 2.20) 


sysmax max proc 


Substituting from Equation 2.8, 
= N__B : {eqn 2-24) 


sysmax max ga 


And, substituting for N__ from Equation 2.19, 


a =i b 
oe nae LPP ain BP Ain (eqn 2.22 


S =" PRpeak: (eqn 2.23) 


sysmax min 


Defining S__ to be the minimum system throughput for which use of the OM 
is justified leads to: 


Se PED b-1 (eqn 2.24) 


° min 


tal (eqn 2.25) 


min SomtPmin 


where P_. = minimum power dissipation of the chip for OM usage to be justified 
To estimate the value of P_. | assume: 


Pain 0-10 watts (lower end of CMOS FPU group in Table III) 
B= 3.43 x 10° (Table IV) 
b=0.099 (Table IV) 
S_.= 3 * 10” bps 


Therefore: 
P itt Or watts 


Naxx [1 processors 


This power level is quite reasonable, and it would seem that from the 


standpoint of heat dissipation an OM-based multiple processor chip is feasible. 


Il. THE NEED FOR A HIGH-SPEED MULTIPLEXER 


Chapter II demonstrated that current technology could produce a chip whose 
throughput would exceed the capacity of conventional multiplexer technology. But 
why consider serial communications and multiplexers at all? Why not exchange data 


with the chip in parallel via pins or leads? 


A. PROCESSOR POWER LIMITED BY COMMUNICATION PATH 

We have seen that future high-density [C’s may be optimally structured as a 
bank of many processors, each of moderate capability. However, even if 
manufacturers can achieve sufficient circuit density to fabricate a multi-processor chip, 
such a device might not be practical due to the large number of leads needed to 
communicate with each processor from off-chip. For example, imagine an N-processor 
IC designed to compute a 2N-point Fast Fourier Transform (FFT). During the 
computation, the IC must read in, then write out, 2N complex output words, or 4N 
real words. Assuming a 40 bit word size, and using the same pins for input and 


output, we can see this IC would need: 


40 leads 
x [4N words] = 160N leads 
word 


How large a package will we need to handle all these leads? Using a Pin-Grid 


(eqn 3.1) 


Array (PGA) package with pins spaced every 0.1 inch, the area of the package 1s: 


[ 16ON leads] eqn 3-2 
Aten = = 0 Nore a 


10 leads 
2) Semin 


For illustrative purposes we can estimate the area of the silicon chip in this 
package by assuming the chip size of the processor is approximately the same as that 
of the processor recently reported by the Matsushita Corporation of Osaka. [Ref. 28] 
Their processor performs a 32 bit floating point multiplication in about 75 nsec and 1s 
32.6 mm*® in area. A chip containing N of these processors would occupy about 32.6N 
mm‘* of silicon. Thus, the ratio of silicon area to package area in our hypothetical IC 


Ss: 


Silicon Areas [ 32.6N] (Eqn sr5)) 


Ratio of ——— _ = ——_ = 3.2 % 
Package Area [ 1032N] 


As IC fabrication technology improves, this waste of space gets even worse. A 
new production technique enabling manufacturers to produce circuits in half the silicon 
area previously required would permit us to double “N” without increasing the silicon 
area. Yet package area would double, due to increased pinout requirements. Once 
some maximum package size 1s reached, further improvements in circuit density do us 
no good--we simply can not communicate with more processors. As one researcher 
stated, “the technology has become increasingly constrained by packaging limitations” 
vet, 38}. 

Increasing lead density will produce some relief from this communication linut, 
but can not be pursued beyond some maximum without excessive fabrication cost. We 
are faced, then, with some maximum package size and maximum lead density, implying 
an eventual limit on the number of leads a single IC can have. 

Given this eventual limit on the number of simultaneous off-chip communication 
paths, Rent’s Rule [Ref. 12:p. 235] 


TAG <° (eqn 3.4) 
where P = Number of chip pads or leads 
G = Number of gates on the chip 


would seem to imply that if the number of paths (P) is limited, then so is the number 
of gates (G) and, therefore microprocessor complexity and computational power. 

This ultimate limit on non-multiplexed designs is not precisely defined. Neither 
maximum package size nor maximum lead density have yet been reached, and industry 


experts are wary of predicting when they might be. In addition, the switch to 


a 


multiplexed designs will probably occur over a range of processor densities and 
complexities, influenced by market factors (there will be few customers for very large 
packages) and manufacturing realities (specialized chip sizes mean more expensive chip 
handling equipment) as well as the theoretical factors described above. 

For all these reasons, large ICs composed of multiple processors will require too 
many pins to use a conventional parallel-transfer scheme with pins or leads. Instead a 
serial communications link must be considered, and as shown in Chapter II, the speeds 


required will exceed the capacity of conventional multiplexers. 


IV. SYSTEM ARCHITECTURE BASED ON SERIAL COMMUNICATION 


Chapters II] and III demonstrate that, in the next generation of ICs, a 
microprocessor may very well be organized as a bank of smaller processors, all sharing 
a relatively few pins through a high-speed multiplexer. But: 


e What on-chip data flow architecture should be employed among_ these 
processors? 


e How can a serial data stream be distributed among N processors? 


e What are the detailed structures of the elements which make up an OM-based 
architecture? 


A. ON-CHIP DATA FLOW ARCHITECTURE 

How should a N-processor chip be organized? The ideal structure will vary with 
the application; this discussion considers one specific application--computing FFTs. 
The number of processors required to compute a given size FFT will depend on 
whether processors are “reused,” that is whether a processor bank’s outputs are 
shuffled and returned to the same processors (reused) or directed to the next bank of 
processors (pipelined). Reusing processors allows a given FFT to be computed with 
fewer processors, but takes more time. The architectures asssociated with both reuse 
and pipeline strategies are discussed in the following sections. 

1. Pipeline Architecture 

Assuming that the throughput of the system is to be maximized, there will be 
no “reuse” of processors. That 1s: 
¢ each processor performs only a two point FFT “butterfly” 


e a new bank of processors performs each stage of the computation in a pipeline 
Strategy. 


The most straightforward architecture for N processors is a N X 1 column. 
How would this grouping affect data flow among the processors? As an example, 
when the task is a 16 point FFT, the processors must exchange data as shown in 
Figures 4.1 and 4.2. Dividing up the 32 processors shown in Figure 4.2 into 4 X | 
chips forces 80 data words to cross chip boundaries during the computation, as shown 
in Figure 4.3. 

Re-organizing the four processors on each chip into a 2 X 2 matrix (Figure 


4.4) results in only 48 words crossing chip boundaries, thereby improving the system’s 


throughput, since off-chip communication delay is lessened, and reducing the demands 
on the communications network. 

The 2 X 2 structure is more efficient because it is the structure of a four point 
FFT. In a sense, the 2 X 2 structure performs all the computations possible on the 
four points it receives, while the 4 X | array, receiving eight points, must hand off its 
data only partially “chewed.” 

There are many such matrices, each corresponding to a particular FFT. For 
example, Figure 4.2 suggests that a 32 point processor IC designed for FFT 
computation would best be configured as a 8 X 4 matrix. In general, the matrix 


dimensions are: 
2h en wiiere m-h2, 3.28 (eqn 4.1) 


2. Reuse Architecture 


The number of processors required by a pipeline architecture to compute a 
P-pome, fe il 1s: 


E 2 aloe ek (eqn 4.2) 


This number of processors may prove to be impractical or simply too 
expensive, or we may not need the ultimate throughput achievable by the pipeline 
architecture, yet still need more throughput than that provided by a uniprocessor. 
Also, 1t may be desirable to adapt an existing pipeline system to compute larger 
FE Ts--without adding processors. In each of these cases, reusing processors in the 
computation enables the designer to tradeoff system throughput for design complexity 
and cost. How are data exchanged among processors in a reuse architecture? 

The computations shown in Figure 4.2 still must be performed, but now 
instead of each block representing an actual processor, it represents a job that some 
processor will have to perform. For example, consider a 16-point FFT performed with 
two 4-processor ICs. Figure 4.5 shows the data exchange for this example if the 
four-processor ICs are organized as 4 X 1 vectors. 

As shown in Table V, even though a chip processes eight points every frame, 
it only transmits four points per frame--keeping half its data onboard for further 
processing With the half it will receive from the other IC. This assumes that some 


on-chip communication path exists to enable processors to exchange data. 


38 














Oa ORKOTTANS 
OSPR NE 
‘aiff \\ 





522 


\ as 


i? es C, Did 
=—~ apa Os 
ai sis) 


oS ee aS ee ts Fes 
/ 


Figure 4.1 Sixteen Point Fast Fourier Transform 
Wefo3 pp. 2009-22). 


39 























Sixteen Point Fast Fourier Transform 
Pipline Implementation. 
40 


Figure 4.2 





Spams top ee) 








SG 
ae a 


ae 


/ fii 
by Q 


Ara 


/ 
i Gq 
a 


Spiom e1egq QT 








‘\ 


i 
\-|— 


; | 1 2 / | 
2. Sige NL 


ee in ae: 
ae NI 
| HOY 


Fs Ve 





SpJommeerco Ton 


t Fourier Transform 


Sixteen Point Fas 
Performed by 4 x 1 Chips. 


Figure 4.3 


41 


‘|= 


Spi4o0M 


eyed QT 








AGUONNOG ditlo —— a 


7 1a 
[bra 
™ 
Ale ay 
ec 


3 


SaJom P1en ay 


| 


ee ee 











ee ee ee, 





ee 





“i 
ne L 


ad 


Va 


Fiat 


te 


e2 
~~ 


~~ 


a 


we 


a: 


ae 


ee oe 


| jin , 
i) een eee 


~ 


ce “| aa 
ale ae eee, 





Spyom e1eg aT 


rm 


2 
NM 
¢ 
cs 
_ 
rc 
ow 
~O. 
3.5 
rv 
CN 
an 
og X 
LL 
CN 
= 
= Py oe 
LO 
co w 
OE 
6 
ae 
Ma 
Bi 
=n 
oO 
ha 
= 
Ob 
Le 











= ~ AMUCNAIOd ALTHO” 


pict 


| | E as 
2.4 aoe 
ave a) le ETq “YId 
eo a Pi eq a | Yera 
eto \| ae RK ae 
Go j= 9 gq Weal 64 





y 


ATG 
eq 


ies SINGISNT INSYSAsTd Ly 





a 
Cl 


| Fe | |e 
ee Li\ ye 
marcel 
: 1 5e\ \I 
E Kar 
ike /ae 
Settee {joe 


, 











eae eat 


aL 











Va Jl 





! 


Sixteen Point Fast Fourier Transform 


4 X 1 Reuse Architecture. 


Figure 4.5 


53 


tage . Chip A Chip B ; 
Receives Transmits Receives Transmits 


me SH 


fo fs f2 fae ai aS ae az f— fae {6 43 as ag alc ani 
fi f9 f3 f12 {6 f1¢ [7 eas 


a8 a9 al0 ail 64 65 sbeueo7 AG. a5 a6ana7 bs bo bio bili 


bs b9 Bio bil Ci ~cs “c5emen ba bs be b7 C8 WE10 Giliz Gia 


XS Ww bo 


cs cio ciz2 c14 Fo Fe F4 Fiz ci cs cs c7 9) bi ito see 
P25 rio Fe ic Fs Fis F7 Gas 


Vie 
EC 


Stage 


. Chip A Chip B 
Receives Transmits Receives Transmits 
[Go is ia eae =o Gl is ole ae 


f2 fio fe fi¢ bi bs bo bir f3 fil f7 f15 ba biz be bie 


be biz2 bie Fo Fs Fe Fi2 bi bs be bit Bios 
Fe Fio Fe F14 Fs Fiz F7 Fis 





As an alternative, consider the same 16-point FFI computed by two 
4-processor [Cs, this time organized as 2 X 2 matrices, as shown in Figure 4.6 and 
Halbles Vi 

Because each IC is only two processors “wide,” a single [IC can only accept 
four data points at atime. This creates an awkward data flow--the source delivers only 
half the input vector, waits, then delivers the other half. Each chip must store the 
output of its first computation while processing the second half of the input vector. 
However, the number of data points exchanged between chips is sharply reduced from 
24 for the 4 X 1 case toiS for the 2 < 2 7casc 


44 


Sapo ce| 


vIyY 
QS) 


peas Ce 
a) 


B19 


99 
eo 


—~ AUUONNOG OTH ——— 


fai 


ewe ne 


has 
eae ee 


a 


a 


| 


a a e 





Sixteen Point Fast Fourier Transform 


2 * 2 Reuse Architecture. 


Figure 4.6 


45 


Stage 


~ 5 ein . _ Chip | 
# Receives Transmits Receives Transmits 


1 fo fs f2 fio br bz bo bir fh fs {3 five ba bie sbomeene 
fa" [12 “fe sie fs [33 17 as 


2 ba be biz bie Fo Fs Fa Fiz bi bz bo bir Fi Fo F3 Fi 
Fe Fio Fe Fi4 Fs Fiz F7 Fis 





A third organization of these four processors permits transmission of the 
entire data vector (as in the 4 X 1 chip) and minimizes the data exchange (as in the 2 
x 2 chip). Its structure is shown in Figure 4.7 and Table VII. 

This structure, possible only if processors are reused, maximizes the “width” of 
the chip while preserving the communication advantages of a “deep” chip. As 
discussed in the previous section, these advantages stem from performing all the 
calculations possible on a given data set before releasing is to another chip. By not 
allowing “partially chewed” data off the chip, the number of data to be exchanged 
between chips at each stage is minimized. In general, an N-processor chip with this 
reuse architecture can perform a 2N-point FFT if organized as an N X 1 vector which 
performs 1+log,N stages. 

3. Interleaving Data Sets 

The efficiency and throughput of any of these reuse architectures can be 
improved through interleaving data sets--that is, delivering new data to the processors 
to work on while they wait for the communications link to recycle their intermediate 
Outputs back to their inputs. Consider the progress of a 16-point FFT calculation 
performed by eight processors organized as in Figure 4.7. The processor wait time is 
cleary evident in Figure 4.8, in which the data sets are not interleaved. I[n this 
example, the throughput is one FFT per (4Tcatc + Txer). 

In Figure 4.9, however, a new data set is delivered to the processors while they 
wait for the results of the first phase of the calculation to be recirculated. In this 


interleaved case, processors are never allowed to be idle. For this example, throughput 


46 


| AIONMOU 


ollie 


a 





t-| 


- 


me 





et ke 


tea 





Figure 4.7 Sixteen Point Fast Fourier Transform 
Modified 4 X 1 Reuse Architecture. 


47 


4 


Sera as ola St bITa PIs9S3 GTlsesd cljivs 


SO 
io tl EC) 


6979 vTGQ99O @T9GeqO etTarq 


ema 11ge€q E1qsaq 


Graqe1a@ -ide1qa 2ascq 


9979 TTQ69 BTQ8qO 


EQ TQ 


Ba Oe 


8Q Gq 


CQ YO 


NoOoooooe 
qoo00ooo0r 


ee ie ee es ie: 


Wigs lige ds CTS St 


ax DBO 
lr Ania 


Md THO 
a hae 


U 


“WD 


Le=L 


Aza eur 


J} Reuse Architecture 


4.8 Sixteen Point Fast Fourier Transform 
Non-Interleaved. 


Modified 4 x 


T-igure 


48 


= 


Time 


CALC 


Cal 


CALC 


T=6T 


4+ 
Y 
+, 
ao 


9 j 
eo ee 


ceceete 


Lf 


9 9 
ere 


1 | 


608 b1d 


De pre 


Lf 


pe ele 


De ple 


ay 
lV 
" 
t a 
N 
Bl 
Og 
1 
k> 
oS 


TY 4+, 
Mw” NM 
ie + 
bs b= 
WY WY 


fo £14 


{! 


fEF14 


Li 


pS or! 


b6b14 


1 | 


9 9 
Sts ii 
9 


b6 014 


— 


£6 f14 


{| 


’ 9 
FOF 14 


teat Senta 1. 


a 


eee) 


ote 


aL 


Some ol2b14 


oS) OSES) S) joj ¥a! 


JL 


9 9 9 9 
Baio rf Si2bt4 


ge eS 


Seo 


Figure 4.9 Sixteen Point Fast Fourier Transform 
Modified 4 X | Reuse Architecture 
Interleaved. 


49 


tot lo 


{| 


9 9 
ter 15 


1 | 


ole 1S 


IBID) lie 


Oo 


aS ule 


§ § 
OW bio 


an 
~] 
7 h 
VU) 


i ee 


1 f 


eee 


is 2 FFTs per 8Tcate, or 1 per 4Tcare--slightly higher than in the non-interleaved case. 
This improvement in throughput was achieved without an increase in bus speed; 
alternatively one could reduce bus speed requirements without lowering throughput by 


incorporating an interleaved reuse architecture. 


B. DATA DISTRIBUTION 
Data delivery to the processors can be accomplished several ways: 
¢ processing elements all receive the same data in broadcast fashion 


e a eo “know” when it’s their turn to receive data and they query the 
LOR 


e data words are “tagged” with their destination--RBIU reads the tag and delivers 
data words to their intended processor 


® processors contend for bus access with each other 
e RBIU delivers data to processors in a preset schedule. 

Only this last scheme (using a preset schedule) promises to have sufficient speed 
to be acceptable for use with the OM. But is it possible to use an a priori schedule, 
and what would it look like? 

1. Pipeline Architecture 

Returning to the example of a FFT computer built of N-processor ICs, Figure 
4.4 shows the data exchanges required by a sixteen point FFT if a pipeline architecture 
is used. 

The sequence of data on the bus is essentially arbitrary. In choosing the 
sequence, it 1s reasonable to avoid sequences which deliver several data words to the 
same BIU one right after the other, in order to minimize the speed required of the BIU. 
Figure 4.10 shows one suitable choice. 

Due to the regular structure of the FFT, there is a simple algorithm to 
calculate the address of any data word’s destination, based only on its position in the 
data stream, as shown in Table VIII. Because of this, the RBIU’s data distribution 
logic can be implemented with little more than a binary counter. The transmission 
algorithm 1s equally uncomplicated, as shown in Table IX. 

The fact that inter-stage data exchange patterns in the FFT computation are 
regular and easily implemented in hardware lends further support to the use of preset 


schedules to control BIU data distribution. 


SO 





es SS 5 ee eS a np + SR ee 


Pits Mr yee eer 1th 6 GTO UO! EG 62 aR Ge) th oe 


Sie a 











ee a ee ee ee - oom ee - ee - a 
a 


a 
f “\~s 2s 


ABUCINMOU ditto 
Z- ve 





= - eH 


mids reek rae ee | con cly 
an ae a <a - a eh, ares C3 
of eral fer | 























S ee a eee =— 
Fal [rs] ca oral ra | cal 


| lee = 





Gta ea 











ere] [ee] [ve] 
9 pl Ci ca 


ate 
ed 


YAGI 
YN 





Besse Flt ce 2c Ols “Slt bse Ts CTS Bot GE Ciera Ts 


Figure 4.10 Sixteen Point Fast Fourier Transform 


Distributed Among Four Multi-Processor Chips. 


ST 





TABLE ait 
RIBUTION 


Word 
Sequence 
Ww; W, W, W 


CO 
@ 
n 
cr 
$9 
= 
O 
2) 
> 
@: 
@e 
™ 
oO 
n 
n 


iw aS" 


© 


ho 
© 


or OHO Oe OO OO Cos 
aura 


HOO HE OOH OOH OO 38 


et OOH HOCH OCH OO 
HORROR OH CHO OH OHO 


0 
0 
0 
0 
| 
I 
I 
l 
0) 
0 
0) 
0 
I 
I 
| 
l 


2. Reuse Architecture 


Tables X and XI and Figure 4.11 show the data flow structure required to 


compute a 16-point FFT with a reuse architecture. Although the task 1s accomplished 


with fewer processors than in Figure 4.10, there are three additional complications: 


additional buffers directly connect processors which must exchange data in 
intermediate stages of the calculation 


an internal path exists between TBIU and RBIU to allow processors which are 
not directly connected to exchange data 


BIUs must coordinate the use of internal and external paths. 


32 


fi ieee 


PRESET SCHEDULE FOR DATA DISTRIBUTION 
PIPLEINE ARCHITECTURE 


WY 
© 
=F 


Word Data 
Sequence 


oo ] 


a 
w 


W 


< 
© 
w 


0 
QO 
Q 
Q 
QO 
QO 
Q 
0 
| 
l 
| 
I 
| 
| 
I 
l 


ee Oe et OO = OO = OO 
ORK OH OE OS SS OH OF COE OOM SS 





C. RECEIVER TASKS 
We can view the data distribution circuitry as being separated into a Receiving 
Bus Interface Unit (RBIU) and a Transmitting Bus Interface Unit (TBIU). The RBIU 
must: 
¢ capture data from the high-speed bus 
¢ convert data from serial to parallel format 
e perform error detection/correction 


¢ deliver the data word to its destination processor. 


Figure 4.12 shows the architecture developed in this project to accomplish these 
tasks. It may be noted that this architecture uses a separately distributed clock signal. 


This scheme was used to simplfy the construction and testing of a system prototype, 


53 


PRESET SCHE 
REU 
Word 


Sequence 
W, W, W, 


5 
so 
= 
© 
= 
A 
3 
oe 
oO 
nn 
nn 


ww 
ot DO OO OOK OO OU 


a 
= 
Ree ODOOT OOS LCA, 


to 


—_ 
= 


— ORO OOH O=—O-O-2 ORE 


et OO SS ODE OO OD 


: 
QO 
0 
0 
l 
] 
] 
l 
Q 
0 
0 
0 
] 
l 
l 
l 





but once past this phase the clock could be embedded in the data stream itself (as in 
Manchester coding), eliminating the need for a separate clock line. Alternatively, if a 
fiber optic data link were used, the clock could be sent on the same fiber as the data, 
but at a different carrier frequency (color), allowing clock recovery tndependently of 
data reception. 

The control signals shown in Figure 4.12 also deserve some discussion. The 
RBIU circuitry develops these signals as a function of the bit count, then distributes 
the signals depending on which word is currently being received. These signals control 
the First-In-First-Out (FIFO) stacks which buffer data between the RBIU and the 


54 


TABLE XI] 


PRESET SCHEDULE FOR DATA DISTRIBUTION 
REVSEARCHMIVECTURE 


Word Data 
eeu Word 


rce Address 
uffer 


1 


i 
O 
ae 


= 


a 
= 
Ww 


iz 
0 
0 


0 
0 
0 


0 
Q 


Pot pot Pet Pet pet pet Peet pees CS) TS) OOOO O 


U 
S 
0 
0 
l 
l 
0 
0 
l 
l 
0 
0 
l 
l 
0 
0 
l 
l 


B 
S 
l 
l 
) 
| 
| 
| 
l 
| 
l 


<A 
OOOO SOOO .X 
tt et et tt eB DOOOOOOO 


8 SS SS 





Processing Elements (P/E), as well as between the P/Es and the TBIU. These FIFOs 
require signals to cause them to: 

e load a new data word (from the RBIU) 

¢ output the next word (to the TBIU) 


e advance the stack to bring up the next output word (now that the TBIU has the 
current word) 


D. TRANSMITTER TASKS 
The transmission part of the data distribution circuitry must: 
e take the data word from its source processor. 
¢ convert data from parallel to serial format 
e add error detection bits 


¢ insert data onto the high-speed bus 


Sy) 





ile PG EO. eT UMPC a 








ee AQUUNMOU dlhid - > 


ea 


ee ee ee 


So 1a2q Sia 


Chak Ud el 


to a 
| Lost Ceol Cal bead Cua 


Fe ey a a ¢ ——— 
s ae a ve a Te ae 
suet SOBRE 

eels Cit 


ol Lal Eel Dol Cod 


Cie 2 a 


121 wat | Fics eta 123 =) ea 


“UH | 
21301 IN PiTeied tf 
ts SK 


OS ets Si vie St Celt SEs (es Ce Olt Tie OF 2s GIP ete 








wo Chips Using a Reuse Architecture. 


ro 


Sixteen Point Fast Fourier Transform 


Figure 4.11 
Distributed Between 1 


36 


Sy344N¢ Sasasa 


INALNO INANI 
OL OL 
Rae Bey a 
‘“ 
SINSW313 ONISSSO0Nd O1 oMeole ep 

QT 

SYax Id 1 L7NW 
NOIL93Yx0> ONY 

iN ‘\ \ 

> ‘ ‘ 

NOTLO3L30 .YOuNS Sone \ae a4a@uUN3 
ONIWWUH HOvIS dvpI tNMdino 





S190") 


ee TWWNOT LENT EWOD 


YSLNNOD YSNNOD 
Sie << 
quoMm 11d 






qetTS rose 





INTO IOH 









Beto rose 





Jee S 





Hoos 


Sila scluleli TeNraleis 


Figure 4.12 Receiving Bus Interface Unit Architecture. 


oF 


SERIAL DATR BUS 





Figure 4.13 Transmitting Bus Interface Unit Architecture. 


Figure 4.13 shows the architecture developed in this project to accomplish these 
tasks. The control and timing circuitry needed to interface the output FIFOs with the 
TBIU 1s included as part of the RBIU diagram. 


E. CONCLUSIONS AND LIMITATIONS OF THIS RESEARCH 
1. Conclusions 

The high speed serial communication provided by the Optoelectonic 
Multiplexer makes possible a shared-bus parallel processing architecture for problems 
like the FFT where the data distribution schedule can be determined a priori. The data 
distribution algorithms for the FFT are quite simple and can be realized with little 
more than a binary counter. 

For the FFT, processors groupings on chip should correspond to the 27? Xn 
matrices inherent in the FFT calculation in order to minimize the amount of inter-chip 
communications. 

Trends in actual processor data suggest that the throughput of the processor 


in most cases 1S proportional to the [size of the processor}, where <1 and “size” 


58 


refers to both transistor count and power dissipation. This implies that, for a given 
chip size, dividing the chip into increasing numbers of smaller processors raises the 
number of processors faster than it lowers the throughput of an individual processor. 
Thus, for most types of processors studied, the greatest throughput is achieved by 
organizing a large chip as a bank of many simple processors. 

Finally, a single-chip OM-based parallel processor is feasible since: 


e Manufacturers can fabricate sufficient transsitors on a single chip to construct 
many simple processors. 


e A chip composed of only about 12 simple processors, easily achieved with current 
fabrication echno’oey. could produce enough throughput in a highly structured 
problem (like the FFT) to justify the use of the OM’s high capacity. 


e Constructing such a chip in a conventional package using one pin or lead per bit 
would require an excessive package size. 


2. Limitations and Recommendations 

The architecture described in this report was designed with only the FFT in 
mind. It may not be adaptable to less structured calculations or to systems which 
must perform a wide variety of calculations. 

Multiple-processor chip performance was predicted based on a _ limited 
sampling of current processor data. Further research, using a comprehensive study of 
actual processor performance, is needed to augment the simple model developed here. 

The comparison of conventional leaded packages and serially multiplexed 
packages considered only the extremes of one pin per bit and one pin per chip. 
Additional study of alternatives between these endpoints is needed to determine at 
what point the cost (in terms of dollars, chip area, and heat) of the Optoelectronic 


Multiplexer is justified by its higher performance. 


De 


LIST OF REFEREN@ES 


Hopper, G., Speech to Naval Postgraduate School faculty and students, 10 
July 1985 


Bernhard, R., “Computing at the Speed Limit,” J/EEE Spectrum, v. 19, p. 26, 
July 1982. 


Rigas, H., “Parallel Processing of Class of Scientific Problem,” Proceedin a 
the Tenth Annual Pittsburg Conference on Modeling and Simulation, p. 30 | 
Instrument Society of Amcrica, 1979. 


Anderson, G. and Jensen, E., “Computer. Interconnection “Struemes 
Taxonomy, Characteristics and Examples,” Computing Surveys, v. 7, pp. 
197-213, December 1975. 


Reedy, R. and Albares, D., Optoelectronic Integrated Circuit, U.S. Patent 
Application number 746704, 20 June 1985. 


Filip, A., “A Distributed Signal Processing Architecture,” Proceedings of the 
Third International Conference on Distributed Computing Systems, pp. 49-54, 
[EEE Computer Society Presssiyaee 


Siegel, H., “The Theory. Underlying the Partioning of Permutation Networks,” 
IEEE Transactions on Computers, Vv. C-29, pp. 791-800, September 1980. 


Horowitz, E. and Zorat, A., “The Binary Tree as an Interconnection Network: 
Applications to Multiprocessor Systems_and VLSI,” JEEE Transactions on 
Computers, v. C-30, pp. 247-253, April 1981. 


Franklin, M,, “VLSI Performance Comparisons of Banyan and Crossbar 
Communications Networks,” JEEE Transactions on Computers, v. C-30, pp. 
283-290, April 1981. 


Lenfant, J., “Parallel Permutations of Data: A  Benes_ Network Control 
Algorithm for Frequently Used Permutations,” /JEEE Transactions on 
Computers, Vv. C-27, pp. 637-647, July 1978. 


Rigas, H. and Abbott, L., Final Report on Optoelectronic. Multiplexer Project, 
es to Naval Ocean Systems Center, San Diego, California, p. 4, August 


Muroga, S., VLS/ System Design, p. 418, Wiley Press, 1982. 
Motorola Corporation, MC68020 Product Data Card, August 1985. 


Kaminker, A., “A 32-Bit Microprocessor with Virtual Memory Support,” 
IEEE Journal of Solid-State Circuits, v. SC-16, p. 556, October 1981. 


60 


bd 


16. 


Ae 


20. 


Zl 


ue. 


ee, 


24. 


i. 


JA 


ad. 


2S. 


a. 


30. 


Kadota, H., and others, “A New Register File Structure for the ee ane 
BU ees IEEE Journal of Solid-State Circuits, v. SC-17, pp. 892-897, 
ctober 


Bevers, J.W., and others, “A 32-Bit VLSI CPU Chip,” JEEE Journal of 
Solid-State Circuits, v. SC-16, p. 537, October 1981. 


Pomper, M., and others, “A 32-Bit, Execution Unit in an_Advanced NMOS 
Hane |ozyie lEppeeouna co; soua-staie Circuits, v. SC-l/, p. 333, June 


Rowen, C., and others, “A Pipelined 32-6 NMOQOS_ Microprocessor,” 
International Solid-State Circuits Conference, p. 180, IEEE, 1984. 


Simcoe, B., and others, “A Floating Point Unit for a_32b Microprocessor 
System,” Custom Integrated Circuits Conference, p. 478, LEEE, 1984. 


Ware, F. A., and_ others, “64-Bit Monolithic Floating Point Processors,” JEEE 
Journal of Solid-State Circuits, v. SC-17, pp. 898-907, October 1982. 


Bursky, D., “Triple-Mode ALU Lets DSP Chip oe Through Multiplications,” 
Electronic Design, Vv. 33, pp. 79-80, 31 October 1985. 


Takeda, K., Ce Chip 80b Floating Point. Processor,” J/nternational 
Solid-State Circuits Conference, pp. 16-17, IEEE, 1985. 


Anderson, J. M., Troutman, B. L., and Allen, R. A.. “A CMOS LSI 16 x 16 
Multiplier/Multiplier-Accumulator,” International Solid-State Circuits 
Conference, pp. 124-125, TEEE, 1982. 


Wittmer, N. C., and others, “A NMOS LSI 16 


S x 16 Multiplier,” Jnternational 
Solid-State Circuits Conference, p. 32, IEEE, 1983. 


Iwamura, J., and others, “A 16-bit CMOS/SOS Multiplier-Accumulator,” 
[EEE International Conference on Circuits and Computers, pp. 151-154, TEEE 
Computer Society, 1932. 


Kaji, Y., and others, “A 45 ns 16 x_16 CMOS Multiplier,” /nternational 
Solid-State Circuits Conference, pp. 84-85, IEEE, 1984. 


Iwamura, J., and others, “A CMOS/SOS Multipher,” /nternational Solid-State 
Circuits Conference, pp. 92-93, IEEE, 1984. 


Uya, M., Kaneko, K., and Yasui, J.. “A CMOS Floating Point Multiplier,” 
International Solid-State Circuits Conference, pp. 90-91, IEEE, 1984. 


Windsor, B. and Wilson, J., “Arithmetic Duo Excells in Computing Floating 
-Point Products,” Electronic Design, v. 32, pp. 144-145, 17 May 1984. 


, F., Chiu, C. P., and Toth, F., “16-by-16-Bit Multipliers Fabricated in 
MOS Rival the Speed of Bipolars,” Electronic Design, Vv. 32, p. 311, 14 June 


oO 


61 


one 


a2 


ao. 


34. 


So: 


36. 


Boe 


B10e 


Tran, T.,,and others. “A 1.0 Micron CMOS 32-Bit IEEE Format Floating 
Point Chip Set_ for Digital Signal Processing,” Custom Integrated Circuits 
Conference, pp. 280-282, IEEE, 1985. 


Bursky, D., “CMOS Multipliers Operate in as Little as 45 ns,” Electronic 
Design, v. 32, p. 250, 12 July 1984. 


Tago, H., and others, “A 6k-Gate CMOS Gate Array,” [EEE Journal of 
Solid-State Circuits, v. SC-17, pp. 907-912, October 1982. 


Rein, H., and others, “A Time-Division Multiplexer IC for Bit Rates Up to 
Pe aa [EEE Journal of Solid-State Circuits, v. SC-17, pp. 306-309, 
une , 


Hughes, J., and others, “A Versatile ECL Las 


l r IC for the Gbit Range,” 
IEEE Journal of Solid-State Circuits, v. SC-14, pp. 812- 


817, October 1979. 


Langmann, U., ee Laser Modulation at 2 Gbit/s by Monolithic Silicon 
Multiplexer,” JEEE Transactions on Microwave Theory and Techniques, v. 


MTT-32, pp. 1675-1677, December 1984. 


- 


Nakayama, Y., and others, “A GaAs Data Switching IC for a Gigabits per 
Second Communication System,” JEEE Journal of Solid-State Circuits, v. 
SC-21, pp. 157-160, February 1986. 


Swartzlander, E., "VLSI Architecture,” VLSI Fundamentals and Applications, 
p. 183, Springer-Verlag, 1980. 


62 


PNT DAS DISTRIBUION LIST 


Defense Technical Information Center 
Cameron Station 
Alexandria, Virginia 22304-6145 


Librarv, Code 0142 
Naval Postgraduate School 
Monterey, California 93943-5002 


Commander (Code 553 

Naval Ocean Syste enter 
Attn: Mr. Don Albar 

San Diego, CalNienel NED. 5000 


Gommander (Code 553 

Naval Ocean Svstems Center 
Attn: Mr. Ron Reedy 

San Diego, ealhoniia 92152-5000 


ne (Code 62) 
Naval eS Bence! 
Attn: Pro Rigas 
Venere Caitore, 93943-5000 


Superintendent (Code 62) 

Naval Postgraduate nehee. 

Attn: Prof. L.W bot 

Monterey, California 93943- 5000 


Superintendent (Code 62) 

Naval Postgraduate School 

Attn: ECE Department Office 
Monterey, California 93943-5000 


63 


No. Copies 
Z 





- 


a 


ae 


i 














HiyDia KI wy Se ene 
wo 4 A! ii ny 


NAVAL PUS 
MONTEREY 





4 P % 


fiskwod i: 3 


GALIFORNIZ 


9544-80 


FT 
Gr * 


Dither it, a ei a al 
4 Oe) ey as Pee ar ay ar 
Pt Sieahh ee Br darts Ce NY evar ’ oe at 
tea role tien PE DASE PEt BS 
Neh ot Pe Sev ret x are Pe Bee aoe Reba 
TSS Saw teat apse ea gy are 
of e pree Pewee es Pty he) EY 
ae et ee ee 
say wroah ne OO area 


om SO FT eR eB oat 9 we Sorby re LEECH nde Fob sh ache ' ah ae , ‘ ‘ 
bo roe wine ian eee re mA TEY m rea H| WH Hi & ie ae : 
: tlfaitG rk win beutece Tae rs fa a A ; Py oe ba ye ee 5 iH) | | lie S cS - 
| mt auth) Pees dutta hy co re : ) WE Zs LS ode Ke 2 34 3 me Ft ee io a . f 
'$38* 7 t a x 5 + | { Fl By a G ry co fT , 7 
| | ALOU 















are 
+ etl 


re 





A serial bus niin for parallel p | ie See ee 








ATL Se eee fhe oe Pee 
Patan Caer Brot oe 
Area eur rude tate i Ney ee! 








met 




















































































































































































































































































































































































































































































Pvt his k ud » Star St ee 4 = s D Poe 7 ° 
ey eur a alent als E a 4 5 3 
oF ey aie Ie ent a = : ain , 5 
ren net eer EH a 9 Bey ree c 8¢) song , I Oe 
ai tay SR - 5 ere ree A , , p ar’ 
hats #y heb Par a tie ; eco ’ 
riety aa Sects PRY IC Pe vstnceest DUDLEY KNOX LIBRARY Oe ae anes 
r rare Sarit DP srdid tai 8,409 or eae ; o ; aie : 
if woe es 4 aa ayer PPC ey , S : ~ = me ¢ 4 : ' P 
agile eicatit Ha a ED Ue Shs Soh Preto eh Pee Lk ear er : | A ¢ * 
San reg eee Dr det tee erhont 4 Moe | a 5 . id har 
Fe pat era ee a m " ree is cy ir ee eo - By ? 
F ? a es ry UR ee eee P * 4 * ’ ys 
te an yoeny r ae 3 F re ae Fs ee ti Pe See ae et ete ee : ree = ’ ‘ ihe ' Pi ’ ‘ 
S F arias bere GsAraerad! al a teddae Pr a Pa Ce | ae ’ ri - = L a 
a" bat MA i ‘ my 4 e . et a 7 A ; ts i =~ 
ages nh er cs 4. cae Y * od a r « e 
ce bes ae os ry r Ff 78 r o 
Ber wetted oa eC > Oe 5 5 » 
oe pwr ; a] * ry bs b z: 
ays Lo ete ‘ ys ae , . “ 
La en TCR nhs rT « 4 ee' 4 ; “ 
tiga roi * Bec Ure rf F, Pye - n “5 2 7 « + * bd . 
com oF 7 Pez vs 5 ee? 2 & steg? S , 4 
a SUC Fes tte rs , bd 
he ek E 44, oS qe: O = at “ Pp ‘ Lr fs 
£098" nw yr bowstabcss u oe 4 5 . ote a FA . 
AGS re mt of ar eee 3 akg: Pele k re Cee Te eat ae ee Oe ie. Cone Cag ' n 
be eel et) PEN Stk karat Baer ee ee a 3 85° esky Oe a ee Pe | tC bh Ast ag sit id oe bd A : 
ot Seheainrbs eye re) aa add re POM wie te we Wl Bw Set he Wee Sr PhS Oe On Pat La F io ORG a “ 4 J 
OT OT eee a ey Sa ae Ot et ee Ye ie bea CORRE AeA ie NL re eels RL OF phn dri rere re ‘ U od ce “7 , te ar pas 
L. “s MY po eo ae be a $40) ro) ed Pave ea ee are ae Sere ye ae ee see A 2 a tts . oi We 7 ig 
A a: € oles a eriteceen rae ari pa Tt oe en ah A “het Ths i Prep ry E wer Cree Er ” bes SEI s e ‘ 
oe ale a Rees ee. fe Tee A ea im CLE MET RE AN 4) aw Dr Pe ee oe ae e to F Sars nae ry On ae ? Sed Pets ite 
Pe hn ne Gale Aa oS ) Sar oe hee Soha 4 zo EONS ar ocer tye oe rte ft ae 5 rt ea yf rig 4 a : 
ay ae Cre ha ors re mire Se ee ebb a We te oh a er A eee Se Ye eS i Py ra) ® . ¢ * pee od Ls ba od : 
Oe: ces He ote san pnt ert ‘2 Po a St ety ct? eS eee 2 ae | “ p Tee] “es . - 
leat tetry o est ety 4 hp yn a 408. B.F- Ct a a aan Dis Pare © als EF 2 A 
rare Bree: rie Fo “clin page CON etn SNS eso PI mn fi " 
re RSC ee ee rf Ha ee oe oo ee a ae 44 Fj e t's i 
eye ote i ny £5) ee ery , Parr re ere ks ; * 
We ree sate rT Ao a eee rey Bs «e , ‘ 
whi te 5a cay i 4 - A ier) i ri f “an re ' q 
Por r ri ray id 1 A 
a3*5 se ow “6 Fi ] ot a Ca a at = F = 
ont er ‘| "4 e Oy . "i ‘ e P ie 
Pree tl a Py er ae er eer eer Cp a es AG Co b = SS “ F 
re eee te roe $ y ros t 1 ’ ‘ J A 
eee ory ree be ; orp : : Sy y a : 
A ss wit Pa a sth é ne F i P 
S an yea wie w OE eee : x * 
ret raat we eet Sod oe oe Lt : bd + 
rere ie : arg ah Pete ier thy Tse Re . n 
ee 110 B. GA py PLA TAS thet Seat an Rtas at A 7 ae 
Dred hi Raa mi rs ransk Paar 2 Cy ew Pa ee oe at Fy P, 
Pea ST ce) Ameer er SE hE VE oe 
Si 3 ea ely i Pere rm a 4 ; . 
, ae Ha Fs Re ts = ; ge 
Wo hie ae Pf whee PEP ae Lerten Ye: : 
= RA SS AR eet OUT be Rt rare x : 
vie BORE Ne Sa hyitge o iata tte tes Parties nA a : 
ia ae ee Re . ie ae ean Qocherdee ales hea Gna Prt ae ae ot 
ea a ey eee dts os Roe utg AAS FS Car re FY AS teats . 
rc i Riven 4 at ea ey ek he ee Ge ae 
c Paka! oh 424.9} bi¢ BScene cbt ce . aa 
eR eats oe tet on Marit orhe Aris SE OA OM. : = cs oe 
Feria, a OL PSS ST TE TOR At WH ey Wee Pit 3 
See Ae Stee a parts th : Poietee ee eS 3 : 
ir re lie ay ar Pee pes ESTO Arar er et rae ® 
We beeed oays Pet ty ST POPrES ore rT 2 ase # n ees 
SR eet arene , er 
eit bs Late aoe or ° 
SE a Bre aS ORS hd RL * it a 
rere wire Cy errr ra FY x As sabe hn ra) ys 
Ag pe eee kh. reine woke 43 ote PE ae ae, : " 
Pra SAL a aE rata Sty 
Pary pierre ry un ee Ba ya st 7 : 
eee " ate Panta e ey 5 ‘ m ee 
py bie Pde ete chitidr ed east ieee c eee oe 
RSet atte Fs iat Ethan Ho eae fs PC Ey ead a ees . 
a Peat ey Re Te Sth Ie 5 a AMSA Re bra had eas, rH ae 
Rarer he trian _ aul ry : ee 
Pr oer Parser » , 
i fe , * 
Pyar hs a 5 5 tne 
petal We rae Le iyy waren ; ci ” 
ee ae Phy Pr ec iekeed Peat ears Sa od wt st wr re aes ane 
eu Ky a* creed Oe pets Pe tite ty eee eet oe Sea rao oD Beer = 
NAGE TS ae Shite ears Seon OL en RS See See Na a? D 
Poth Be sd Core Loe os Pet ea ee hie Pyar oe eee Ve ae es SR eee wy Pat Ree a Le s Ss 
bor | ERR Cpe dF pa ale, it pe Oy Pe Poy Phe oe ear Arihe a er Att eee “ ee 
LiLo rad fe The see Taverne tetas ee > , har Cor ne er . e $ 
ostan ed [ae “ho tis re i oe LO ee | we PY | S ar Cr oe bo | Ly ns . Jal CTL A nd 4 Le - 
eC ates we er) tr hs Wares = h uh oe veeS a ae : ae a z 
rs Wiad) i a er ie LU aed ee Pe Rat od Su Cn 
i ~PPet ates Pereat) Sees aT ae ed ot bee oY ee oy 
w Ledropnre Pare eeae, Stes le®, PYT Ne a She eee Venera et 
ist sep Dead ore tat in a SYS EEA Parc Tar Ly ore ERE re 
toe RL tae is Mie be Co Rivas MSP ei ton ks See Say an 
ET TaT Saeed ioe bd p Ste hoe eed ft Pies Lette tile ts . c 
tae Phat rh eet i) PHT bray , 
Satta hte 5) ) oye y Wis Gr . - 
Fee py ot ee Y ° es 
vay aor AYES A 
Steps tt Aes F* 5 “ee 
THRs re any i 
SUNY Ue Cao Cio oiAC ROUGE ‘ De Cn 
‘a Tht ewan a Mart re eh x Ld ofetn e"y - . r 
Path at heat ers Fy tel 2 Rat ald ‘-) Met PI SS ae ta a 5 
pe Ppa ee *¢ ae Le % 
Lr et a ae or Py 































































































































































é ar « tae" “as ewe o a bgt oC : ae fe 
asd ; phy ie re ed So Pia ae | H aay ty ee iG Bee oe = a i ih a et 
Pate at 1s TENG F Niwas ASE RS oe es ye rru ry Co a 4 re pituaies epee Hae Sy oie otets 
Hy Rate 4 rar T] fi i] . ae a ah : sf a cr er ar ee Se eo Se ee Dh 
: Pere nan whl oar roo by yes 3 4 7 
ORG REN LES Ue aya V ES IAN) a Pi ee 
SUR OR Fa eter Py ch hes to es 7 PET | REM US,t et ror ata ete ata be otal et ria ts ee he ed s 
parator ia Mere Me ts eras pre oe LT rae RP ea Pa Cerrar rt en re bate eres ite ' 
be | MeL Pua ES se el hk a Lay PCr Pk bAa* a Ls® Sanat! eter gens Pra Th s‘ ie “§ ¢ 
oA’ rear ey VN age SYS YUL S CAP aN stg Arevoroaear i Pei TTY ae bi 
Ai reel rae Ty Paget Ble trro te Bk TT eT para ar) & So a Ad 
pik 4 Ee eb PvLearu Pats! PLC ace Si ir c CRE eens Toes 
Pave ‘ re crd Pesan Pte he rat a | ota%,a* “eh ar ” o™t 
Fae bete eae teek) PAR oro Rte Re tart party ny a a Sgr? ee Vaart el 
‘ ERGs Ba Stee Pros Reet rr bs Jrihactecantn pep rk: LT ier bet) ear hd 
Sear fits. eon aa aati ee aero ae TR osina.cd ye MY LAY) ls at tt tale Der a FO an Sa Dit 
re rey hire hy io) aor ae a irae! cee Ty 4 Oar et re oe ee TY cee es Po 
Dh Aen Ae LEE LOLS Ph) TD de WS ae Co a ee a be 





: et ar ey STE ey ed 
ate Wararars orate one erent 
bed Cast d thal eee ND Mae Bet st 





oe pele tets 


On hey | el 






























ay B } 

POTS t ET | paket ei takes pee Ps Vics 2 BTR: nee 
CEES He stra Parr he fac ict OAH on & ye i or a] Digty Sphevated ce 

APRON ae teas rote Sn rh PL pete ecgrankiors 


Park oT rh T tL Peary 





















tarot ko PTs E ry? yavary Sehne bla Pe 














3 
ae * oF-f7 5 Pert Bee sa! a) 
Re bar iat rte Ooh See cree eek SE SRR oR Salle 
peat erstan i aot ee} PD ce hess Pied or tae Serer “,% a be ts “a 
toc # Pitre Tse) oes aed aaa oo ee he tae Ure 






igs dey Tiara tht nee SC aarey Ae ted dhe 


+ 2 i aes ees? oat z 


Sia ati Pee Seartt ae VPA eh Pee ore eT ee el 
= 


or o hay 


hl at ae tt tt 
adel SR athe 










i Pv Tare aot tae Peery ohn eta a Peta 

i Praha So are pret etg ee beds Ch 4° Arh Sora ate ote 

Asa here cated Wes CAP ors Yk Por ky oe Ore Pht eres et A cay ie ae 

i Fe ata ata STG et CASTERS TERS Tete pe arg 
¢ J 3 bi i = a AT So ry a 0 

iieeectaa Preity, aS Sea Has Leader de Rb PU ree a 

$4 





rit be be | ai%d%s 
Para iar . f 
Cs ee ee Pe 
LaLa *ebe?s ree hd nT . es t 


. 
bh tae pi he eT ad 










































ce Las . a 
ee evar) Fae Pelee ep F eee ios Shaas a Ly pte Eb) 1 Fn », ies eal ae oitrerneRec dobetat ‘a's rn 
peru aneesnen at air naeee Bata ant the ease roa Pai { Per nat ke ra er ee ee 






Pras CP het eh et el rd Yorn eerie) 
LT ATR viet hee Da ate teh rhe® 
aie" ara ers et aS hel 












ror beet al hea eh thee toed OL ee eo vate 
ms Pe pe bes ar Soe ayetie ‘ rte ‘ 
SSE TL. er i? “F 






















ir 

















































































bros otIuyx se, A ‘ 
ott tb ads Pe TTCter ET or and Oris Oe gee Stason s A 2 
rae é rs Hee te Fo thes ye Sasa pot at mos 5 eS my Pa eee Tea - ry io : or) 
Fate aL “ TERPS O tts Peat Coe eee hea Th Piper e cH sotuets tesa’ ghee tes ree a 
AN: Atari tao Deed oie ataaerh ath teeth ah me eS od LEO yrs ae . 
pat Ta pe a ara at ay bh ark a Ue i rhe eae ca rural SCE aL ay 
an inte len pee Cte yhaty Beats {Pantera ite Heth brett eh ae ety Demo er e porate 0 
a ae AS Tb tg ae reps BAT oH Ay aS iA Rachie ; yon 
ister sist aes rie rei Paris eH , Pk 
Tietetd be a tat hte tae Ee eo ee ty ae 
tH He ye aipeaeaes : eagtth | Let wi 
ant Ritter het areite ATS 
“a Ret tS, ivi aT in ih 
tai: 


eee Pat beg *, a tot er 
rt paemc arate 
DAT Sree rrereeess NECN Pa abt 
A hie aif MR PDT SH % 
HA TLS aa tS: rT 
Noh EPRIVI NAS, Le Print a it 








rae 
Pht ror hk 











































S *-J- asphee Aaterd Cha by 2 
cir Fate ce erey bo rat aS Tea ae are iH A PRIT ce er re Speen ee 
Li! cs Lavi Lis Brat Pd a oe wy ia Pa) eT ce rat PM eSS he, See BL tees) 
Urn x oh gt 7 
























DN Bra lbs A bri es PT LY Pt ey 
Fy ayeNt, “ee eer ry Pe PRES a ey 
Eten ae Ph Pe 0) 


Ati ay pret 
Sete . A bath ra] Peal bs noes ri a” rh aes! Py tirs (3 
Gershe® APP esway tee, crn or Ren a whgihe Fey te 
ASN ao Fos WA Ete On tCRE HY PUPP sae et T 
¥ hd ts Rta ts Pores ot DY LEA Part Sh ed ahy 
a pga eta be eke Pata tara ta ti PP ar a oa peor wate td og dtat a hin) 
RA an Sreseort yoasagese ot at & aray he aye ast 2UrteX te be he stake 
Whe oy TeO Ly ee Spent Ree orgesg Pryde dys TRsas elem. 

tes ae Ly Bian Pi Gall Pio é: i Are ee A 
SiN y aria! Th Het nah Partiate Ware Lair PP ei Pee ESSE he to 
Teh AAT ravtet a te Le Fare , Cer aa 

Dy Q cay] Pe Poe Pree ss Uy a CL Ts *y - 
ria! : bite Ly et ec Ti AEE Par a be 


Mahia ek RPL 
seta: > re a Aries ear Sectia t ae: 3 | 


eee © 
ramen tad Lape: 
ong aC 

















ca 
are aaah ee 
Bi etauhV rape 

PPA S st BuO 





PA Peery mavens 
ae Terr 
On 


















ile fit 3 halt Sweat hs 
a) Pe oe ye 







are art tee 




























Pearse 










SPA le tey on w “¥ ks 
ids ogre hd apt hnga ass tae thar et 
ae torte | eit, Meda teet eat OT) 
































ath Tn tlh ee TLS Par Pee, ®, q 
PERN An ee rape eta OK es 4 
ater nik a GUAM iad UMA OT PCIe ; 
VY vere Amat O44 4 Teen Qyrte-y A < es 
BCA sty q" ay pty. aS 
Moa Toe PET it rai'g Sarat ot atte +¥ pla ee ‘ i 





aH Paha ete rary Po tao 
on Pee rari eaten oy 


a dele 
° 











ie rar ae 


ee CT Oe OL 


50 AM PL Poot eer Py 












sf Aes OL erie | Ser Petr SSRSET 9) EGP aa J 
uy Sb aeyaeet At hia a a at 


As iY He ara rhs Re a parenaE q 






“i ore 
ra 










































































yr 
ru oh aa ay K 7 5 
ig Canes aioe he Hy os Laren os Lea Age “Oy = - a 
be } . whe a " tw he eile’ F 
as ni POL ak oe Ps sur, Cy Oa eo ee ee 
a hiked SS MTA rT tay a : i rt Pe | 
ee Le Chee wee Gt rh x Para 
xr) Tuliee Reed ph ae hat a Ue Ly ¥ “hI . re bad -. a a Tr 3 ae 
Ra poate aireprt Py ean Tet om bey “He ah eet: 2 4 Mah f the 
re i 1 tae By ele BAIL hia ts rechten on eats lee yiule? st ) Me ests 
; ribet ui pe ea ie Se ME ek 3 re arty 
VZGTRS eRe) eer: eae 4 jd a te ‘ig Sara 3 eid ‘ 
ost. tnt STR eet FS abe ¥} ry 
User 2F0F Cys vie my sib iy ‘a a 
BND. rata) hint ie tty ib a 
' ep 


ra) 


aren er 










































ee re ae ed 
oo 





































7 4 

aul ow. t .: Me 

ial A Dear ae ‘ Ha ay { t , na yy ops ¢ 88. 
J sa? 5 Th . hd 
+ a) thy: Aaa Nae vin AEN Eat ay eid At re he ae sa 
Ritts tear i: i a) oN By that oe BR ines 
es ee Re rr) ey + 5 ‘ be m is, Pea 
Be seen tho ws AY ™ “ Fe ha : A! sith Sooners vty cert 
‘ A Py ” 4 
Neue ats yaanh Apa cae Ee Cae a en 
i Py rs a wt 5 
Pare = es is v “Ay Ds Bayt a3 Nyt ? w ri Sas t if co } Ue ee 
PA | oar a ¥ "1 
oxy rm Ot ai oe hal EAS pak if ity so ti 


ar raat iH pues LH re fie} 








































os ta A 
uy at an M4 Pata ee Hrd : be eid ey Seu Seed Ce tk ae Naat 
ies he Sano ry 43 ay pete eri ode 4 ee Ree Aa O ae i 
ete 1G ‘4 i is sl pia ae tA a aa eas SM! Eye ey cet te tt Pad a ‘ 
Ry DR THLE oS ey Ae a YEAS a 
Q 475 Ci mY 14M at A Oy Sa | Ae Vee oa Lies Sk lB Sl 
4 ae gage BAEOW Dow wmept aeyy egtteds ra a) At aed hela u pu Bey Tht de tor YF Ke ry 


rhfabtes pl rD, + cL 


at 4 ty % OL ae oe 


" Rae * dk oe bi) 
} aa ont te Ota 
















4 Rt ae be 


















Vedas Vt a be) be Ha Pas 4 Sie dure, * tr itete: 
ata Oth (Trl eee rate i} hd ee A ha poate aS °o are . 
ieee LR ta tea rors ty PERE Te ara ACL titers 4 ; 
AMD ee Se ta otag Pa ae TA ; . 3 = i . 
bs sash a bate Pua PL re Pa) ry RS tee a ‘ ‘ n . 
eae tet ee Tus he aX : : , a . : ae 
b ie we OsF pat ta vas ART on % a i rs 
cy or ra terre aes tsa es ne eens P Pt a) . Ps rs ny . es ; F 
ne Caneel ‘vperas re Vie ert a) ee o . + : ® . *e 
Cds cat bt eG Career, te MST as Seu iz oe yt e HM % t 7 ° , 
Aa ey td Pre. yi ty 1.3 Re eee htt eee sy oo A An Py 


alg 


